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INTRODUCTION 



1 he fact that actuarial science is fundamentally 
a branch of biology rather than of mathematics is 
overlooked far more generally than ought to be the 
case. Most people, even those of education and wide 
culture, are inchned to look upon an actuary as a 
particularly crabbed, narrow, and intellectually dusty 
kind of mathematician. In reality his subject is one 
of the liveliest in the whole domain of biology, and 
none surpasses it in its practical interest and import- 
ance to mankind. Because, what the actuary is, or 
at least should be, trying alw^ays to formulate more 
and more definitely are the laws which determine 
the duration of human life. Why the actuary in fact 
is too often intellectually but httle more than a sort 
of glorified computer, is really only the result of a 
defect in the teaching of biology in our colleges and 
universities. It has only lately come to be recognized 
anywhere that a biologist needed a substantial founda- 
tion in mathematics in order successfully to practice 
a biological profession. It is not too rash a prediction 
to say that presently the time is coming when no 
important actuarial post will be held by a mathe- 
matician who knows little or no biology. The vigor 
and originality of his biological outlook will be valued 
as highly as the rigidity of his mathematical sub- 
structure now is. 
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II Introduction. 

The thin^ which chiefly makes this book by my 
friend Ame Fisher notable, lies, in a broad sense, 
in the fact that it is a highly original and absolutely 
novel essay in general biology. The language is to a 
considerable extent ma-thematical, to be sure, but the 
subject matter, the mode of logical approach, and the 
significant conclusion — all these are pure biology. 
Unfortunately many biologists will not be able to 
appreciate its significance, or even to read it intel- 
ligently. But this is their loss, and at the same time 
an exposure of the dire poverty of their intellectual 
equipment for dealing with the problems of their 
science. 

There are two broad features of Fisher's work 
which want eniphasis. The first is the succeasful 
construction of a life table from~a~knowledge ol deaths 
alone. That the construction is successful his results 
set forth in this book abundantly demonstrate. To 
have done this is a mathematical and actuarial 
achievement of the first rank. It may fairly be 
regarded as fundamentally the most significant ad- 
vance in actuarial theory since Halley. It opens out 
wonderful possibilities of research on the laws of 
mortality, in directions which have hitherto been 
wholly impossible of attack. The criterion by which 
the significance of a new technique in any branch of 
science is evaluated, is just this of the degree to which 
it opens up new fields of research. By this criterion 
Fisher's work stands in a high and secure position. 

But of vastly more significance considered purely 
as an intellectual achievement is his discovery of 
the fundamental biological law relating the several 
causes of death to each other, which made the tech- 
nical accomplishment possible. More than one a,ccepted 
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text book on vital statistics has scornfully instructed 
its readers that no good whatever could come from 
any tabulation or study of death ratios; that they must 
be avoided as the pestilence by any statistician who 
would be orthodox. But orthodoxy and discovery are 
as incompatible intellectually as oil and water are 
physicaDy, a cosmic law often overlooked by our 
"safe and sane'^ scientific gentry. This book is an 
outstanding demonstration that this law is still in 
operation. Fisher has had the temerity to study the 
ratios of deaths from one cause or group of causes 
to those from another group, or to all causes together, 
and has discovered that there al)ides a real and 
hitherto unsuspected lawfulness in these ratios. Here 
again his pioneer work opens out alluring vistas to 
the thoughtful biometrican. 

Altogether we of America are to be warmly 
congratulated that this brilliant Danish mathematical 
biologist has chosen to come €ind live with us. 

Baltimore, November 1921. 

Raymond Pearl. 



AUTHOR'S PREFACE 



1 he classical method of measuring mortality rests 
essentially upon the fundamental principles first 
enunciated by the British astronomer, Halley, in his 
construction of the famous Breslau Life Table. Since 
the time of Halley this method has been so thoroughly 
investigated and has been perfected to such an extent 
that new developments along this line cannot be 
expected. Any improvements on the original principles 
of Halley are after all nothing but refinements in 
graduating methods; and even in this line it appears 
that the limit of further perfection has been reached. 

Halley's method, which is purely empirical in 
scope and principle, rests primarily upon the know- 
ledge of the number of persons exposed to risk at 
various ages and the correlated number of deaths 
amon^ such exposures. In all cases where such 
information is at hand the old and tried method meets 
all requirements to our full satisfaction; and it would 
appear superfluous to try to supplant it with fun- 
damentally different principles. 

In presenting the new method outlined in this 
little book I wish to state most emphatically that it 
has never been my intention to try to supersede the 
conventional methods of constructon of mortality 
tables wherever such methods are applicable. My 
proposed method is only a supplement to the former 
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tools of statisticians and actuaries, and aims to 
utilize numerous statistical materials to which the 
older system of Halley is not applicable. The idea, 
whether it is new or not, meets in reality a very 
frequent need in mortality investigations. It is a well 
known fact that in the determination of certain 
statistical ratios, it is easier to determine the nume- 
rator than the denominator, as for instance in life 
or sickness assurance, where the losses can be 
ascertained with a very close degree of accuracy, 
while the collection of persons exposed to risk at 
various a<ges is often difficult to obtain. Similar 
remarks hold true in the case of numerous statistical 
sumonaries of mortuary records as published in most 
government reports on vital statistics. The desire to 
utilize this enormous statistical material was what 
led me to try the proposed method. 

In principle the plan is fundamentally different 
from that of the empirical method of Halley, inasmuch 
as I have attempted' to substitute the inductive 
principle for that of pure empiricism. 

In the first place, I consider the d^ curve, or the 
number of deaths by attained ages among the 
survivors of an original cohort of say 1,000,000 
entrants at age 10, as being generated as a compound 
curve of a limited number (say 8 or less) of subsidiary 
component curves of either the Laplacean-Charlier or 
Poissori-Charlier type. 

The method of induction now consists in deter- 
mining the constants or parameters of these sub- 
sidiary curves. These parameters fall into two 
separate categories: — 

A. The statistical characteristics or semi-invari- 
ants which determine the relative frequency distribu- 
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tion by attained age at death, as expressed by the 
mean, the dispersion, the skewness and the excess 
of each subsidiary or component curve. 

B. The areas of each subsidiary or component 
curve. 

The working hypothesis which I have put forward 
is that the relative frequency distribution of deaths by at- 
tained ages, classified according to a limited number of 
groups (generally 8 or less) of causes of death among the 
survivors of the original cohort of entrants, tend to cluster 
around certain ages in such a way that it is possible from 
biological considerations to estimate in practice with a 
sufficiently close degree of approximation the statistical 
characteristics or semi- invariants of the relative frequency 
distributions of the component curves, corresponding to a 
previously chosen classification of causes of death (into 8 
or less subsidiary groups). 

This implies briefly that I suppose it is possible 
from biological considerations to select a priori the 
statistical characteristics of the category as mentioned 
above under A. 

Once this hypothesis is accepted as a true supposi- 
tion, the areas of each of the component curves can 
be determined by purely deductive methods {as for 
instance the msethod of least squares) from the 
observed values of the proportionate death ratios 

Rb(x) (x = 10, 11, 12, 100; 5=1, II, HI, 

) corresponding to the groups of causes 

of death. 

Thus the parameters as determined in this 
manner exhaust the given statistical material, i.e. 
the observed proportionate death ratios R^ (x). A 
mere addition of the subsidiary or component curves 
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gives us then the compound d^ curve from which it 
is an ea^y task to find the functions, l^ and q^. 

The scheme as we have briefly outlined it above 
is, therefore, not a cut-and-dried doctrine or a sort 
of " nmthematical alchemy" as some of my critics 
liave implied. Nor is it an authoritative or infallible 
dogma. The keystone upon which its success depends 
is merely a working hypothesis; i.e. a temporary or 
preliminary supposition. I suppose something to be 
true and try to ascertain whether, in the light of that 
supposed truth, certain facts fit together better than 
they do with any other supposition hitherto tried. 

The validity of the working iiypothesis must, in 
my opinion, be proved or disproved either by 
indiependent methods and principles of construction 
of mortahty tables, such as for instance the empirical 
principle of Halley, hitherto exclusively used by the 
actuaries, or through additional biological studies. * 



^ The biological basis of Mr. Fisher's working hypothesis, which is 
of far greater importance than the purely ancillary mathematical deduc- 
tion, has apparently been overlooked by many of his American critics, 
such as Little, Thompson and Carver. Dr. Carver in the Proceedings 
of the Casualty Actuarial Society of America (Vol. VI, page 357) 
remarks that "if we can construct a table from death alone as in Proe. 
Vol. IV, and by dividing these deaths by 9^, determine the unenumer- 
ated population — why not the converse?" 

The answer to this remark is obvious. In the case of mortuary 
records, Fisher considered two different and distinct attributes, namely 
1) the purely quantitative attribute of attained age at death, and 2) the 
purely biological attribute of cause of death, which in conjunction with 
the working hypothesis to a certain extent aims to replace the unknown 
exposures. If we were to follow Dr. Carver's facetious suggestion and, to 
use his phrase, "go the proposed plan one better by using enumerated 
populations only", we should, however, encounter a statistical series with 
the single attribute of attained age only, but no second attribute corres- 
ponding to that of the biological factor of the cause of death. Criticisms 
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In the meantime I feel justified in presenting to 
my readers the practical results obtained by this 
method, which although perhaps not unimpeachable 
in respect to mathematical rigour, neverthelees in my 
opinion offers a means to attack a vast bulk of 
collected statistical data against which our former 
actuarial tools proved useless. The celebrated Russian 
mathematician Tchebycheff, once made a remark to 
the effect that in the antique past the Gods proposed 
certain problems to be solved by man, later on the 
problems were presented by halfgods and great men, 
while nofw dire necessity forces us to seek some 
solution to numerous practical problems connected 
with our daily conduct. The problem towards which 
I have made an attempt to offer a sort of solution in 
the present little essay is one of these numerous 
problems of dire necessity mentioned by Tchebycheff, 
and I hope that my work along this line, imperfect 
as it is, may nevertheless prove a beginning towards 
more improved methods in the same direction. 

In conclusion I wish to extend my thanks to a 
number of friends and colleagues both in America 
and Europe and Japan who have kept on encouraging 
me in my work along these lines in spite of much 
adverse criticism from certain statistical and actuarial 
circles. I wish in this connection to thaiik Mr. F. L. 
Hoffman, Statistician of the Prudential Insurance 
Company, for permitting me to apply the method to 
various collections of mortuary records while working 
as a computer in his department. My thanks are also 



of the sort of Dr. Carver's brings to light the fundamentally different 
principles applied by Mr. Fisher in sharp contradistinction to the purely 
empirical methods of the orthodox actuary and statistician. 

Translator. 
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due lo Mr. E. A. Vigfusson for malting the trans- 
lation from my rough Danish notes. If the resulting 
English is perhaps open to criticsm, I beg to remind 
the reader that my original manuscript was written 
in Danish and translated into English by an Icelander, 
while the composition enfid proof reading was done 
by a Copenhagen firm. 

To Professor Glover of the University of Michigan 
I also wish to extend my thanks for inviting me to 
deliver a series of lectures on the construction of 
mortality tables before his classes in actuarial 
methods during the month of March 1919. This 
invitation afforded me the first opportunity to bring 
the proposed method before a professional body of 
statistical readers. 

Last but not least I desire to acknowledge my 
obligations to Professor Pearl whose introductory 
note I consider the strongest part of the book. In 
these departments of knowledge the appreciation of 
one's peers is after all the only real reward one can 
possibly expect. The fact that this eminent biologist 
has recognized that the nucleus of the whole problem 
is of a purely biological nature, and that the 
mathematical analysis is merely ancillary, is 
particularly pleading to me, because it represents my 
own view in this particular matter. 

p. t. Newark, U. S. A., Noveniber 1921. 

Arne Fisher, 



TRANSLATOR'S PREFACE 



During the spring of 1919 the attention of the 
present writer was called to a brief paper entitled 
Note on the Construction of Mortality Tables by means of 
Compound Frequency Curves by the Danish statisticican, 
Mr. Arne Fisher. The novelty and originality of this 
paper impressed me to such an extent that I became 
desirous of obtaining more detailed information about 
the process than that which necessarily was contained 
in the above summary note, originally printed in the 
Proceedings of the Casualty and Acturial Society of 
America. 

I wrote therefore to Mr. Fisher and inquired 
whether he intended to publish any further studies 
on thisi subject. From his reply I learned that he had 
delivered a series of lectures on this very topic before 
Professor Glover's insurance classes at the University 
of Michigan during the month of March 1919,. but that 
the proposed method had been met with such captious 
opposition in certain actuarial circles that he had 
decided to abandon the plan of publishing anything 
further on the subject and had even destroyed the 
English notes prepared for the Michigan lectures. 

In the meantime the proposed scheme had 
received considerable attention in actuarial circles in 
Europe and Japan and several highly commendatory 
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reviews had appeared in the Enghsh and Continental 
insureoice periodicals and various scientific journals, 
notably the Journal of the Royal Statistical Society and 
the Bulletin de V Association des Actuaires Suisses, The 
proposed method seemed indeed so novel and unique 
that I could not help feeling that it deserved a 
better fate than that of being forgotten. I sug- 
gested therefore to Mr. Fisher that he prepare a 
new manuscript. But unfortunately his time did not 
allow this. He consented, however, to turn over to 
me his original Danish notes on the subject from 
which he had prepared his Michigan lectures and 
permitted me to make an Enghsh translation for the 
Scandinavian Insurance Magazine. I gladly availed 
myself of this opportunity to bring this fundamental 
work before an international body of readers and 
started on the translation in the summer of 1919. 

At the same time Mr. Fisher decided to put the 
proposed method and working hypothesis to a very 
severe test, which would meet even the most stringent 
requirements of some of his critics and their conten- 
tion that the rriethod would fail in the case of a 
rapidly changing population group. For this purpose 
he selected a series of statistical data contained in the 
annual reports and statements of a number of the 
leading Japanese Life Assurance Offices, relating to 
their mortuary records for the four year period from 
1914r-1917. More than 35,000 records of male lives, 
arranged according to the Japanese list of causes of 
death and grouped in quinquennial a^e intervals 
formed the ba^is for the construction of the final 
life table which was completed in November 1919. 
This table, which like Mr. Fisher's other tables was 
derived without anv information of the number of 
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lives exposed to risk at various a^es, is shown in the 
addenda of this treatise. 

Immediately after its construction Mr. Fisher sent 
this table to the well known Japanese actuary, Mr. 
T. Yano, and asked him for an opinion regarding the 
trustworthiness of the finai death rates of q^ as 
derived by his new method. The Japanese actuary's 
answer arrived in April 1920. Mr. Yano had after 
the receipt of Mr. Fisher's letter ascertained the 
exposures and deaths among male lives at each 
seperate age for about 40 Japanese hfe offices during 
the period 1914 — 1917 and constructed by means of 
the conventional methods a complete series of g^ by 
integral a^es from age 10 to 90. These ungraduated 
data are shown as a broken line polygon in the 
appended diagram (Figure 1). In spite of the fact that 
Mr. Fisher had no information whatever about the 
exposed to risk the a^eement of the continuous curve 
of g^ as determined by the frequency curve method 
with Mr. Yano's ungraduated data is so close that 
I think further comments superfluous. The slight 
differences in younger a^es might indeed rise from 
the fact that Mr. Yano had access to all the experience 
(containing more than 45,000 deaths) of all the Ja- 
penese companies, whereas- Fisher only used the 
mortuary records as published by some of the leading 
Japanese companies. 

Like all scientific methods of induction Mr. Fi- 
sher's proposed plan rests upon a working hypothesis, 
namely that it is possible from biological considera- 
tions to group the deaths among the survivors at 
various ages in any mortality table according to 
causes in such a manner that their percentage or 
relative frequency distribution according to attained 
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a^e at death will confonn to a previously selected 
system or family of Laplacean-Charlier or Poisson- 
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Fig. 1. 



Charlier frequency curves. Mr. Fisher himself is very 
frank in stating that this is a working hypothesis 
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upon which hinges the success of the whole method. 
One of the main objections of his critics is that it 
seems impossible to prove the truth of this working 
hypothesis. Naturally its truth cannot be proved by 
mathematics or logic any more that we can prove 
or disprove the existence of Euclidean spaxje, which 
in itself constitutes a working hypothesis for most 
of our applied mathematics. Mr. Fisher's critics might 
as well be asked to prove or disprove Newton's 
hypothetical laws of motion and attraction as 
extended by Maxwell and Hertz, or the newer 
hypothesis recently put forwards by the relativists, 
or the Lorentz hypothesis of contraction. It would 
indeed be a terriffic blow to science and the extension 
af knowledge if it was required that no working 
hypothesis would be alloved in scientific work unless 
such hypothesis could be proved to be true. What 
position would biology occupy to-day if biologists had 
insisted that Darwin's great hypothesis be proved 
before it could be allowed as a foundation in the study 
of evolution? 

The most convincing answer to Mr. Fisher's 
captious critics among the old school of actuaries 
and statisticians is, however, the undisputed fact that 
his working hypothesis as such really does work. 
As pointed out by Dr. Pearl in the introductory not6 
of this book the results set forth in the present 
treatise abundantly demonstrate this fact. The 6 
widely different mortality tables as shown in the 
addenda stand as mute and yet as the most eloquent 
evidence to the fact that the method works. It might 
indeed not appear impertinent to suggest that Mr. 
Fiisher's actuarial critics would render a greater 
service to their profession by proving that these six 
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mortality tables cannot be considered as reasonable 
approximatons to teibles derived by orthodox means 
from the same population groups than by starting 
to poohpooh and ridicule his proposed method. 

Winnipeg, Canada, November 1921. 

E. A. Vigfusson. 



"Nothing is less warranted in science than an uninqui- 
ring and unhoping spirit. In matters of this kind, those 
who despair are almost invariably those who have never 
tried to succeed." 

W. Stanley Jevons. 



CHAPTER I 

(TRANSLATED BY MISS DICKSON) 



AN INTRODUCTION TO THE THEORY OF 
FREQUENCY CURVES 

i. INTRODUCTION The following method of con- 
structing mortality tables from 
mortuary records by sex, age 
and cause of death rests essentially upon the 
theory of frequency curves originally introduced 
by the great Lraplace and of recent years further 
developed and extended through the elegant and 
far reaching researches of the Scandinavian school 
of statisticians under the leadership of Gram, 
Charlier and Thiele and their disciples. This 
method is, however, comparatively little known 
and unfortunately not always fully appreciated 
by the majority of English statisticians and ac- 
tuaries, who prefer to apply the well known 
methods of the eminent English biometrician, 
Karl Pearson. For this reason it may be advisable 
to give a preliminary sketch of Charlier 's methods 
so as to obtain a better understanding of the 

1 
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following chapters dealing with the more specific 
problem of mortality tables. The treatment must 
necessarilj'' be brief and represents essentially an 
outline of the more detailed theory which I hope 
to present in my forthcoming second volume of 
the Mathematical Theory of ProhabUities . 

By the method of Charlier any frequency 
function is expressed as an infinite series rather 
than as a closed and compact algebraic or tran- 
scendental expression by the Pearsonian methods. 
By power series the thoughts of the majority of 
students are associated with the famous series 
which bear the names of Taylor and Maclaurin. 
In these series the function is derived as an in- 
finite series of ascending powers of the inde- 
pendent variable whose coefficients are expressed 
bv means of the correlated successive derivatives 
of the function for specific values of f(x). Thus 
for instance we know that the Maclaurin series 
may be vmtten as follows : 

fix) = /(O) + 1-/'(0) + |^r(0) + . . .g/*(0) + . . . 

where /*(0) is the symbol for the value of the n** 
derivative when x = and n = 1, 2, 3, 4 . . . . n. 
There are, however, contrary to the belief of 
many immature students, only comparatively few 
functions which allow a rigorous expansion by this 
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method, in which the derived functions and the 
differential calculus play the leading roles. 

But on the other hand there are other methods 
of expansions in infinite series which are more 
general and by which the coefficients of the in- 
dependent variable are expressed by operations 
other than those of differentiation. One of these 
methods is to express the coefficients as definite 
integrals either of the unknown function itself or 
some auxiliary function. 

The range of practical problems which lay 
themselves open to a successful attack along those 
lines is much wider than the corresponding range 
of practical problems to which we may apply the 
Taylor series. 

Speaking generally as a layman (who continu- 
ously has to face practical rather than abstract 
problems) and specifically as a mathematical 
novice (who considers mathematics as a means 
rather than as an end) this fact appears to me 
quite obvious from a purely philosophical point of 
view. In nature and in all practical observations 
we encounter finite and not infinitesimal quantit- 
ies. In other words, what we actually observe are 
finite sums or definite integrals, i. e. the limit of 
a sum of infinitely small component parts. 

The definite integral rather than the derivative 
and the differential seems, therefore, to be the 
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more elementary and primitive operation and the 
one which suggests itself first hand. History of 
Mathematics indeed proves this contention. Ar- 
chimedes had (as shown by the researches of the 
Danish scholar, Heiberg) laid the essential foun- 
dation for an integral calculus about 500 B. C. 
And nearly 25 centuries later, almost simultane- 
ously with the historical discovery of Heiberg an- 
other Scandinavian, the Swedish mathematician 
and actuary, Fredholm, gave to the world his 
epochmaking work on integral equations. Fred- 
holm's monumental memoir **Sur une nouvelle 
methode pour la resolution du problems de Dirich- 
let** was first published in the ''Ofversigt af oka- 
demiens forhandlinglar** (Stockholm 1900). Mea- 
sured by time the subject of integral equations is 
thus a mere infant in the history of mathematical 
discoveries. Measured by its importance it has 
already become a classic. Its application to a 
steadily increasing number of essentially practical 
problems in almost every branch of science has 
placed it in a central position of modern mathe- 
matical research and it bids fair to become the 
most important branch of mathematics. 

Fredholm in introducing his now famous in- 
finite determinants, known as the Fredholmean 
determinants, had a forerunner in the Danish 
actuary, Gram, whose Doctor's dissertation "Om 
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Rsekkeudviklinger ved de mindste Kvadraters Me- 
tode" (Copenhagen 1879) gave prominence to a 
certain class of functions which later on have 
become known as orthogonal functions, and by 
which Gram actually gave the first expansion of 
a frequency distribution or frequency curve in 
an infinite series. Scandinavians in general and 
Scandinavian actuaries in particular may, there- 
fore, feel proud of their share of imparting know- 
ledge on this important subject, which makes a 
strong hid to place mathematics on a higher plane 
than ever before, not alone as an abstract but 
equally well as an applied science. The genius 
of the Italian renaissance Leonardo da Vinci, as 
early as 1479 proclaimed "that no part of human 
knowledge could lay claim to the title of science 
before it had passed through the stage of mathe- 
matical demonstration*'. Comparatively few bran- 
ches of learning measure up to the standard of 
Leonardo da Vinci, and our learned friends among 
the economists and sociologists have a long road 
to travel before they succeed in placing their 
methods in the coveted niche of science. But the 
new vistas of possibilities opened up to them by 
means of M. Fredholm's discovery ought to 
furnish them a powerful tool towards the attain- 
ment of the high standard set by the great Italian. 
The principal theorems of integral equations 
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are bound to be especially fruitful in their ap- 
plication to mathematical statistics and the pro- 
blems of frequency curves and frequency surfaces 
together with the associated problems of mathe- 
matical correlation. 

2. FREQUENCY K ^ succcssivc observatious 

DISTRIBUTIONS - - j.- r i.u 

AND origmatmg from the same es- 

FUNCTioNS sential circumstances or the 
same source of causes are made in respect to a 
certain statistical variate, x, and if the individual 
observations o. (i = l, 2, 3, . . . . N) are permuted 
in an ascending order then this particular per- 
mutation is said to form a frequency distribution 
of X and is denoted by the symbol F(x), 

The relative frequencies of this specific per- 
mutation, that is the ratio v^hich each absolute 
frequency or group of frequencies bear to the 
total number of observations, is called a relative 
frequency function or probability function and is 
denoted by the symbol (p(x). 

If the statistical variate is continuous or a 
graduated variate, such as heights of soldiers, 
ages at death of assured lives, physical and astro- 
nomical precision measurements, etc., then 

dz(p(z) 

is the probability that the variate x satisfies the 
followinsf relation 
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or that X falls between the above limits. 

If the statistical variate assumes integral (dis- 
crete) values only such as the number of alpha 
particles radiated from certain metals and radio^ 
active gases as polonium and helium, number of 
fin rays in fishes, or number of petal flowers in 
plants, then 9(2?) is the probability that x assumes 
the value z. .From the above definitions it follows 
a fortiori that 

(a) F(z)==N(p(z) (Integral variates) 

(b) dz F(z) =N(^(z)dz (Integrated variates) 

Interpreting the above results graphically we 
find that (a) will be represented by a series of 
disconnected or discrete points while (b) will be 
represented by a continuous curve. 

As to the function (p(z) we make for the 
present no other assumptions than those follow- 
ing immediately from the customary definition of 
a mathematical probability. That is to say the 
function ^(z) must be real and positive. 

Moreover it must also satisfy the relation 

+ 00 

^(p{z)dz ==1, 



-.00 



or in the ease of discrete variates : 
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2= 00 



2'q>(z) = i 

Z = 00 

which is but the mathematical way of expressing 
the simple hypothetical disjunctive judgment that 
the variate is sure to assume some one or several 
values in the interval from — oo to + oo. The 
zero point is arbitrarily chosen and need not coin- 
cide with the natural zero of the number scale. 
Thus for instance if we in the case of height of 
recruits choose the zero point of the frequency 
curve at 170 centimeters an observation of 180 
centimeters would be recorded as +10 and an 
observation of 160 centimeters as — 10. 



3. PROPERTY OF I^ regard to a frequency funo- 

CONSTANTS OR .- • • 

PARAMETERS tion wc may assume a pncffi 

that it will depend only upon 
the variate x and certain mathematical relations 
into which this variate enters with a number of 
constants A^, Xg, Xg, X^ , symboUcally ex- 
pressed by the notation 

F\X, Xj, Xo, Xg, X^ . . . .) 

where the X's are the constants and x the variate. 
All these constants or parameters are naturally 
independent of x and represent some peculiar pro- 
perties or characteristic essentials of the frequency 



J 
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function as expressed in the original observations 

0. (t=l, 2, 3, N). We may, therefore, 

say that each constant or statistical parameter 
entering into the final mathematical form for the 
frequency function is a function of the observa- 
tions 0^. This fact may be expressed in the follow- 
ing symbolic form :— 

Xj = 5i (Oj, Og, 03, . . . Oj^) 

^1 = *^2 (flu ^2? ^3? • • • ^n) 



y<y— Sy (©1, ©2? <>3» • • • ^y)' 

But from purely a priori considerations we 
are able to tell something else about the function 
S . (i==l, 2, 3 .... N). It is only when per- 
muting the various o's in an ascending magnitude 
according to the natural number scale that we 
obtain a frequency function. This arrangement 
itself has, however, no influence upon any one 
of the o's which were generated before this purely 
arbitrary x>ermutation took place. The ultimate 
and previously measured effects, of the causes as 
reflected in each individual numerical observa- 
tions, 0^, depend only upon the origin of causes 
which form the fundamental basis for the stati- 
stical object under investigation and do not depend 
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upon the order in which the individual o's occur 
in the series of observations. 

Suppose for instance that the observations 
occurred in the foUov^ing order 

Oi, 03,03, 0^. 

By permuting these elements in their natural or- 
der we obtain the frequency distribution Fix). 
But the very same distribution could have been 
obtained if the observations had occurred in any 
other order as for instance 

O7, o^f 0^ , . . . O3 . • . . 0|. 

so long as all of the individual o*s were retained 
in the original records. Or to take a concrete ex- 
ample as the study of the number of policyholders 
according to attained ages in a life assurance 
office. We v^ite the age of each individual policy- 
holder on a small card. When all the ages have 
been written on individual cards they may be per- 
muted according to attained age and the resulting 
series is a frequency function of the age x. We 
may now mix these cards just as we mix ordinary 
playing cards in a game of whist, and we get an- 
other permutation — in general different from the 
order in which we originally recorded the ages on 
the cards. But this new permutation can equally 
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well be used to produce the frequency function if 
we are only sure to retain all the cards and do 
not add any new cards. 

4, PARAMETERS- The varfous functions S (Oj, 

VIEWED AS \ ii 

SYMMETRIC ^2' ^3 ^jy) ^.rc there- 

FUNCTIONS fQj^Q^ symmetric functions, that 

is functions which are left unaltered by arbitrarily 
permuting the N elements o, and no interchange 
whatever of the values of the various o's in those 
symmetric functions can have any influence upon 
the final form of the frequency function or fre- 
quency curve, F(x). 

We now introduce under the name of power 
sums a certain well known form of fundamental 
symmetrical functions defined by the following 
relations 



*0 


= oO + oO + oO+...o5^ 


-N 


*1 


-o] + ol + ol + ...o], 


-Xol 


*2 


- o\ + ol + ol + ...o% 


-Xoi 





s^ = 0^+ of + of + . . . 0^ = 2'<. 

Moreover, a well known theorem in elementary 
algebra tells us that every symmetric function 
may be expressed as a function ot s^, s^, s^ . , , 

• • • *i^« 
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From this theorem it follows a fortiori that 
we are able to express the constants X in the fre- 
quency curve as functions of the power soms of 
the observations. While such a procedure is pos- 
sible, theoretically at least, we should, however, 
in most cases find it a very tedious and laborious 
task in actual practice. It, therefore, remains to 
be seen whether it is possible to transform these 
symmetrical functions of the power sums of the 
observations into some other symmetric functions, 
which are more flexible and workable in practical 
computations and which can be expressed in terms 
of the various values of s. 

6, THiBLB's It is the great achievement of 

iNv^^ANTS Thiele to have been the first 

mathematician to realize this 
possibility and make this transformation by intro- 
ducing into the theory of frequency curves a pe- 
culiar system of symmetrical functions which he 
called semi invariants and denoted by the symbols 

Xj , Ag , A3 . . . 

Starting with power sums, Si. Thiele defines 
these by the following identity 

,„eli:^ IT + TT -^ • • = s,+'^+'-^+'-^ + . . . (1) 

which is identical in respect to co. 
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Since Si =^o* the right hand side of the equa- 
tion may also be written as e^^"" -\- e^^"^ + 6^^"°+ , . .= 

Differentiating (1) with respect to co we have 



St,e 



XjOO XjCD* X3CO* 

"[17 "^ 117 + "ii; ■^••• 






r ^1 ^2 



= l«o+jY«>+|f-«>''+[f-^''+-- 



. X,co X, _ 



5, +nr-CO-hn?-COc»-|-i-^COo + .. . 



'^\1 ^\2 '^\3 



8 



Multiplying out and equating the various 
coefficients of equal powers of co we finally have 



5i = 



\iSi + X^Sq 

\S2 + 2X2^1 + \Sq 

Xj 5.3 + SXo^g + 3X3^1 + X^^o 



where the coefficients follow the law of the 
binomial theorem. 

Solving for X we have 



^1 

X, 



Si : ^0 






14 Frequency Curves. 



The semi-invariants X in respect to an ar- 
bitrary origin and unit are as we noted defined 
by the relation 

X,(D X,©* X«cd" 

'■ _j_ _? ^ _« f- . . . 

5oC =e* +e* 4-e' +... 

where o^, Oj, O3 . . . are the individual observa- 
tions. 

Let us now change to another coordinate 
system with another unit and origin defined by 
the following linear transformations : — 

o'i = aoi + c (i = 1, 2, 3, . . .)• 

The semi-invariants in this new system are 
given by the relation 

X^ro X'^m^ X;a)8 ^ 

Li. L?. LL * 0\(D O'oOO OLfO 

_ (aOi4-c)(X) (aoj + c) CO 

"~" C "T~ o "1" • • • 

Since the various values of X' do not depend upon 
the quantity co we may without changing the 
value of the semi-invariants replace co by co : a 
in the above equations, which gives 
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s^e 



[la ^ a2[2 ' a«[8 



(ooj+c) — {ao^-\rc) — (ooj + c) — 
a a a 

= e +e H-e +...= 

ceo 

'TT r OlCO . O-CD . OgCO . I 

= e [e * +e* +e' + ...J = 

CCD Xi03 X2032 X3(D* 

= e SqB 

Taking the logarithms on both sides of the equa- 
tion we have 

ceo Xico XftCo^ XoCo' 

« 11 li li 

Differentiating successively with respect to oo we 
have 

>; X',co X;oo« c , ^ Xa«>* 

V + -V + ibr +••• = - +^i+^»«' + -V+ •• • 



aLl ' aa ' 2a» ' ■ • • a ' ^ » 2 

X; xico 

^ + -V + • • • = ^8 + ^4^ + • • • 
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Letting co = we therefore have 

— = — h Xi or a; = aA| + c 

-| = Xg or \; = a^Xg 

_| = X3 or X; = a'Xa 



from which we deduce the following relations 

Xj {ax -h c) = a\i (x) -f c 

\r{ax-\' c) = (f\r(x) for r > 1, 

which shows how the semi-invariants change by 
introducing a new origin and a new unit. 

We shall for the present leave the semi in- 
variants and only ask the reader to bear in mind 
the above relations between X and 8, of which we 
shall later on make use in determining the con- 
stants in the frequency curve cp (a;) . 

6. THE FOURIER Before discussing the genera- 
iNTEGRALS ^.^^ ^j ^^^ ^^^^j frequency 

curve it will, however, be nec- 
essary to demonstrate some auxiliary mathema- 
tical formulae from the theory of definite integrals 
and integral equations which will be of use in the 
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following discussion a« mathematical tools with 
which to attack the collected statistical data or 
the numerical observations. 

One of these tools is found in the celebrated 
integral theorem by Fourier, which was the first 
integral equation to be successfully treated. We 
shall in the following demonstration adhere to 
the elegant and simple solution by M. Charlier. 
CharUer in his proof supposes that a function, 
F(o3), is defined through the following convergent 
series. 

F{to) = a [f{o) + f{a)e''^' + /(2a)e'"''' + . . . 
+ /^)c +/(— 2a)e -f... 

Wt = CO 

or F(to) = a^/(am)e""""* (2) 



m = — CO 



where i = ^ — i. 

We then see by the well known theorem of 
Cauchy that the integral 



4- 00 



/(to) = (/(x)e*"*(fo; (3) 



QO 



is finite and convergent. If we now let ma = x 
and let a = as a limiting value, a becomes 
equal to dx and /(am) = /(a?). Consequently we 
may write 

2 
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lim F{(o) = /(o9). 

a = 

Multiplying (2) by e~^^*d(o and integrating 
between the limits — nja and + n/a we get on 
the left an expression of the form 

Ci^(co)c-""°*dm 

and on the right a sum of definite integrals of 
which, however, all but the term containing 
/(ra) as a factor will vanish. This particular term 
reduces to 

"\/(''^)^ or 2nf(ra). 

— n/a 

Hence we have 



2n) 



f(ra) = -^\F((o)e~ """* da>. (4 a) 



— Tila 



By letting a converge toward zero and by the 
substitution ra = x this equation reduces to 



-\- OO 



'A 



/(^) = 2^\/(«x.)e-"°'rf«.. (4b) 



00 
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Charlier has suggested the name conjugated 
Fourier function oi f(x) for the expression F (to). 
We then have, if we introduce a new function 
Tb(co) defined by the simple relation: 

j/2jri|)(co) = limi^^Cco) - T .. 

= 

t|)(co) = -L \f(x)e'^'dx. (5a) 



\/2n] 



(-00 » 

f(x) = -^\tI)(co)e-«°<dco. (5b) 



V2n] 



— « 



The equations (5a) and (5b) are known as 
integral equations of the first kind. The expres- 

sion e (or e ) is known as the nucleus of 
the equation. If in (5b) we know the value of 
t})(co) we are able to determine fix). Inversely, 
if we know fix) we may find ^K^) from (5a). 

^ cum^^AST^E ^® ^^ ^^^ ^^ ^ position to 
an^Yntegr^ make use of the semi-invariants 
EQUATION of Thiele, which hitherto in 
our discussion have appeared as a rather discon- 
nected and alien member. On page 13 we saw 
that the semi-invariants could be expressed by 
the relation 

9* 
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A« Aa . Aa ^ 

where Oi (i = 1, 2, 3 ) denotes the in- 
dividual observations. 

The definition of the semi-invariants does not 
necessitate that all the o's must be different. If 
some of the o's are exactly alike it is self-evident 

that the term e^* must be repeated as often as 
o occurs among all of the observations. If there- 
fore Ncp(o<) denotes the absolute frequency of Oi 
where cp(Oi) is the relative frequency function, 
then the definition of the semi-invariants may be 
written as : — 



r^® f Tir <»* + 



Xa 



8_L 



For continuous vafiates, x, the above sums 
are transformed into definite integrals of the form 



\ (p{x)da = \ (p(x)e^dx. 



■OD 



Let us now substitute the quantity co[/ — 1, or 
ICO, for CO in the above identity. We then have : — 

e 



\ <p(x)dx = I <p(x)e^'^dx 



■QC 'OD 
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under the supposition that this transformation 
holds in the complex region in which the func- 
tion is defined. 

In this equation the definite integrals are of 

special importance. The factor \ (p(x)dx is, of 



•30 



course, equal to unity according to the simple 
considerations set forth on page seven. The in- 
tegral on the right hand side of the equation is, 
however, apart from the constant factor ]/27i 
nothing more than the i|> function in the conjugate 
Fourier function if we let (p(x) =f(x), and 

e^ ^ ^ = ^/2:T^I>(co). 

According to (5b) we may, therefore write f(x) 
or (p(x) as 



^^*) = ¥S S 



too 



+ TIT- ♦*«>*+ r^ **co* +•• • 



e^ ^ ^ e""^da> 



— 00 



as the most general form of the frequency func- 
tion cp (x) expressed by means of semi-invariants. 

8, FIRST APPRox- The exactness with which 
SOLUTION T (^) is reproduced depends, 

of course, upon the number of 
X*s we decide to consider in the above formula. 
As a first approximation we may omit all X's 
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above the order 2 or all terms in the exponent 
with indices higher than 2. Bearing in mind 
that ? = — 1 we therefore have as a first ap- 
proximation 



^o<^> = 4 S 



e dco. 



— 00 



The above definite integral was first evaluated 
by Laplace by means of the following elegant 
analysis. Using the well known Eulerean relation 
for complex quantities the above integral may be 
written as 



\ 



cos [(Xi — X) co] dco + 



— 00 



+ Q0 \2 



+ i \ e sin [(X^ — a;)coj dco. 



00 



The imaginary member vanishes because the 






9 • • r 

factor e is an even function and 8in[(Xi— a;)co 

an uneven function, the area from — oo to wil 
therefore equal the area from to + oo , but be 
opposite in sign, which reduces the total area 
from — Qo to + oo or the integral in question to 
zero. 



— ^1 
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In regard to the first term, similar conditions 
hold except that cos [(X^ — a:)coj is an even func- 
tion and the integral may hence be written as 



oe 



\. 



= 2 \ e co8(rco)dco where r = \ — x. 



o 



Begarding the parameter r as a variable and dif- 
ferentiating / in respect to this variable we have 



dr X« 3 ^ 






-n 



Xocoe ) sin (rco)dco. 

2 ' 







From this we have by partial integration: — 



dr \ 



^^^y « 00 ^8^« 



9 



CO' 



sin(rco)dco 



2rr 



2 

« ~ o 



if® 

e cos(rco)dco 



^ rl Id/ r 

= {)—— or - — = ——. 

Xo / dr Xg 

From which we find 



r^ 



where log 4 is a constant. Hence we have: — 



2 

- r 



^e^« 
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f 
In order to determine 4 we let r = and we 
have 



00 X 



/. = . = 2S.-'-*.-2)/i=i/^. 



o 



This finally gives the expression for cpo (^) ^ *^® 
following form : 

as a preliminary approximation for the frequency 
curve cp(a;). 

The first mathematical deduction of this ap- 
proximate expression for a frequency curve is 
found in the monumental work by Laplace on 
Probabilities, and the function ^q(x) entering in 
the expression cpja;) dx, which gives the probab- 
ility that the variate will fall between x — ^dx 
and X -\-^dx, is therefore known as the Lapla- 
cean probability function or sometimes as the 
Normal Frequency Curve of Laplace. The same 
curve was, as we have mentioned also previously 
deduced independently by Gauss in connection 
with his studies on the distribution of accidental 
errors in precision measurements. 

Laplace's probability function, cpo (x) posses- 
ses some remarkable properties which it might 
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be well worth while to consider. Introducing a 
slightly different system of notation by writing 
Xj = M and j/X^ = <5, (Po(x) reduces to the fol- 
lowing form. 

which is the form introduced by Pearson. 

The frequency curve, %(x), is here expressed 
in reference to a Cartesian coordinate system with 
origin at the zero point of the natural number 
system and whjose unit of measurement is also 
equivalent to the natural number unit. It is, 
however, not necessary to use this system in pre- 
ference to any other system. In fact, we may 
choose arbitrarily any other origin and any other 
unit standard without altering the properties of 
the curve. Suppose, therefore, that we take Af 
as the origin and (J as the unit of the system. The 
frequency function then reduces to 

1 -x«:2 

Since the integral of (po (x) from — qo to + oo 
equals unity the following equation must neces- 
sarily hold. 

+ 00 



\ 



= 1/271. 



00 
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9. DEVELOPMENT The Laplaccan Probability 
BY POLYNOMIALS ^^^^^ poflsesses, howsver, 

some other remarkable proper- 
ties which are of great use in expanding a func- 
tion in a series. Starting with cpo (x) we may by 
repeated differentiation obtain its various der- 
ivaties. Denoting such derivatives by (Pi (x) , 
(Pg (x), <Ps(^) ' ' ' respectively we have the fol- 
lowing relations.^) 

— x«:2 

(Pq{x) = e 

cpi(x) = —x(fQ(x) 

Vz(^) = — (a^^— 3a:)(po(x) 



and in general for the n*h derivative : — 



(p„(a:) = (-!)« 



af» 



n(n — l) 



n — 2 

X + 



^ yi(/i — l)(yt — 2)(yi — 3)/ 

2-4 

/i(/i— l)(/i— 2)(/i— 3)(/i-4)(yt-5)/""^ 

2.4.6 ■^•'* 



9o(^)- 



^ In the following computations we have omitted 
temporarily the constant factor 1 : /2ic of 9o(^) ^^^ i^ 
derivatives. 
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It can be readily seen that the derivatives of 
cpo (x) are represented throughout as products of 
polynomials of x and the function (p^ix) itself. 
The various polynomials 

H,(x) = 1 

Hi(x) = — X 

H^(x) = x^ — 1 

H^(x) = —(x^—Sx) 

H^(x) = (x^ — Qx^ + 3) 

and so forth are generally known as Hermite's 
polynomials from the name of the French mathe- 
matician, Her mite, who first introduced these 
polynomials in mathematical analysis. 

The following relations can be shown to exist 
between the three polynomials 

frn+i(a;) — xHn(x) + nHn^i(x) = 
and 

d^Hn(x) xdHnix) 



dx^ dx 



4- nHn(x) = 0. 



A numerical 10 decimal place tabulation of the 
first six Hermite polynomials for values of x up 
to 4 and progressing by intervals of 0.01 is given 
by Jorgensen in his Danish work "Frekvens- 
flader og Korrelation". 

There exist now some very important relations 
between the Hermite polynomials and the deriva- 
tives of (po(x), or between Hn(x) and ^ni^)- 
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Consider for the moment the two foUowin^^ 
series of functions 

^o(^), ^i(^), H^{^\ H,{x\ H,{x\ . . . 

where (pn{^) = -ffn(aj)<Po(^) ^^^ where lim q)n(a;) = 
for ^y = ± 00. 

We shall now prove that the two series cp^ (a?) 
and Hn (x) form a biorthogonal system in the 
interval — oo to + oo, that is to say that they are 

(1) real and continuous in the whole plane 

(2) no one of them is identically zero in the 

plane 

(3) every pair of them ^n(x) and Hfnix), 

satisfy the relation. 

+ 00 

J (pn(x)Hm{x)dx = {n ^ m). 



•00 



We have the self evident relation (letting x = z) 

5 Hm{z)(pn{z)dz = J Hfn(z)Hn{z)(pQ{z)dz = 



— 00 — 00 



= J Hn{z)(Pfn{z)dz. 



— 00 



Since this relation holds for all values of m and n 
it is only necessary to prove the proposition for 
n>m. For if it holds for n>m it will according 
to the above relation also hold for n<m. 



^- * ■' — «■«» ■ t ». 1 -. Tk<^1 Tws r^rm i q 1 q 
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w w 



-^ 






9^ 
11 



S 

I 

1 




o 

2! 



we have finally that 



2 











(2)(Pn-l(2)rf2 

e of Hm(z). 

ght reduces to 

have therefore :— 



^ t{z)(pn-l(z)dz 



;(2)(Pn-2(2)rf2 



v'(z)(Pn-3(2)d2. 

lin finally an ex- 



m 



<P„_„_i(z)dz, 



xivative of H_ (z) 
s a polynomial in 
ative is zero and 
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5 Hn,{z)(pn{z)dz = 



00 



for all values of m and n where ^ m. 

For m = n we proceed in exactly the same 
manner, but stop at the mth integration. We 
have, therefore, by replacing m by n in the above 
partial integrations 

Y^«(2)<P«(2)d2 = (-l/f<>(z)cp„_„(z)dz = 

— OD — QO 

==(-ir'jV;>(z)<Po(2)tfo. 



The nth derivative of Hniz) is, however, nothing 
but a constant and equal to ( — ir[w_. Hence we 
have finally 

— 00 — 00 



= |n_l/27T. 

The above analysis thus proves that the func- 
tions Hm(^) and <pni^) are biorthogonal to each 
other for all values of n different from m through- 
out the whole plane. 

We can now make use of these relations be- 
tween the infinite set of biorthogonal functions 
Hmiz) and ^niz) in solving the problem of ex- 
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panding an arbitrary function <p{z) in a series 
of the form 

the series to hold in the interval from — qo to 

4- 00. 

K we know that (p.(z) can be developed into 
a series of this form, which after multiplication 
by any continuous function can be integrated 
term for term, then we are are able to give a 
formal determination of the coefficients c. 

This formal determination of any one of the 
c's, say Ci consists in multiplying the above 
series by Hdz) and integrating each term from 
— 00 to 00. All the terms except the one con- 
taining the product ^{(^)(pt vanish and we have 
for Ci, +00 +00 

5 ^{z)Hi{z)dz \ (f{z)Hi(z)dz 

— QD — OD 

Ci = 



+ 00 



[(Pi{z)Hi{z)dz |2_|/2^ 



— 0» 



If we define the Hermite functions as 

H„(z) = 1 
H,(z) = z 
H^iz) =«* — 1 
H,(2) = z^ — 3z 
HJz) = z* — 6z'' + 3 
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the above formula takes on the form 



4-00 +QD 



J (p(z)Hi(z)dz 5 ^(z)Hi{z)dz 



d = 



00 — 00 



+ QD 



J <p< (z) Hi (z) dz (— 1/ LlV 2 31 



QO 



which we shall prefer to use in the following 
discussion. 

It will be noted that this purely formal cal- 
culation of the coefficients c is very similar to the 
determination of the constants in a Fourier Series, 
where as a matter of fact the system of functions 

coaz, cos22;, cosS;^, 

sin;2;, sin2;2?, sinS^;, ...... 



is biorthogonal in the interval 0<2:<1. 

But the reader must not forget that the above 
representation is only a formal one, and we do 
not know if it is valid. To prove its validity 
we must first show that the series is convergent 
and secondly that it actually represents ^(z) for 
all values of z. 

This is by no means a simple task and it can- 
not be done by elementary methods. A Eussian 
mathematician, Vera Myller-LebedeflE, has, how- 
ever, given an elegant solution by means of some 
well known theorems from the Fredholm integral 
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equations. She has among other things proved 
the following criterion : — 

"Every function (p(z) which together with its 
first two derivatives is finite and continuous in the 
interval from — oo to + <» and which vanishes 
together with its derivatives for z = ±co can be 
developed into an infinite series of the form : — 

where Hi(z) is the Hermite polynomial of 
order i". 



10. GRAM'S SERIES It is, howevcr, not our inten- 
tion to follow up this treatment 
which is outside the scope of an 
elementary treatise like this and shall in its place 
give an approximate representation of the fre- 
quency function, (p(z), by a method, which in 
many respects is similar to that introduced by 
the Danish actuary Gram in his epochmaking 
work "UdviklingsrsBkker" , which contains the 
first known systematic development of a skew 
frequency function. Gram's problem in a some- 
what modified form may briefly be stated as 
follows : — Being given an arbitrary relative fre- 
quency function, cp (z), continuous and finite in 
the interval — oo to + oo (and which vanishes 

3 
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for z = ±co) to determine the constant coeffi- 
cients Cq, Cj, c^, c^ in such a way that 

the series 

coyo(g) , Ciyi(g) ^ g2y2(g) ^ . . . i ^^ (^ = 

]/%(^) l/9o(^) V^o(^) '" V%(^) 



vm ^''""'^'^ 



gives the best approximation to the quantity 
cp (z) : j/cpoC^Jy) in the sense of the method of least 
squares. That is to say we wish to determine the 
* constants c in such a manner that the sum of 
the squares of the differences between the func- 
tion and the approximate series becomes a mini- 
mum. This means that the expression 



+ 00 






dz 



00 



must be a minimum. 

On the basis of this condition we have 

where the unknown coefficients c must be so de- 
terinined that 
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/ = 






— 00 



<p(g) 



U(z) 



dz equals a minimum. 



Taking the partial derivatives in respect to Ci we 
have 



bi _ 2&Y y(g) 



oi _ -JO f 
hCi hCi i 

— OD 



/q'o(2) 



U{z)dz + —{ [U{z)ydz. 

hCi J 



— 00 



Now since 



00 



\[Uiz)Ydz = 



-j-co 



— 00 



S {< [ffo(^)y+cl [H,(z)y+ . . .cl[Hn(z)]*}<pMdz, 



oe 



we get 



•i- 00 -1-00 

^= -2 C -^Hi{z)\^J^)dz+2c, [ [Hi{z)]*<f,iz)de 

— OO — 00 



where the latter integral equals 



-{-00 



5 iPi(z)Hi(z)dz = (— l)'[i j/23r. 



— 00 



Equating to zero and solving for c» we finally 
obtain the following value for c* — 

a = /=^ C <p{z)Hi{z)dz (i = 1, 2; 3, . . .). 

\iV2Tt_i 

3* 



OD 
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This solution is gotten by the introduction of 
|/<Pq (z) which serves to make all terms of the 
form Cicpi(e):l/(po(0) = )/(PoW CiHi{z) {i = 1, 2, 
3 . . . /i) orthogonal to each other in the interval 

00 to +00. 

In all the above expansions of a frequency 
series we have used the expression % {z) = c"**^* 
as the generating function (see footnote on page 
26), while as a matter of fact the true value of 
9o(^) is given by the equation <Po(^) = c""**/* : |/2w. 

The definite integral on page 32 

{- If \ Hi{z)<pi{z)dz = [£ {^-'"'^dz = [£ |/2^ 



OD 00 



will therefore have to be divided by |A27i, and 
the value of the general coefficient Ci will hence- 
forth be reduced to 

+» 
J <f(z)Hiiz)dz 

— oc 

Ci == - 



where fli (z) is the Hermite polynomial of order 
i defined by the relation 

i (i — 1) (i — 2) (i — 3) {i — 4) (i — 5) z*~* 

2.4-6 "•"••• 
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On this basis we obtain the following values 
for the first four coefficients : — 

QD 

Ci = (— l)»5<p(2)3(fe:y_ 

<x, = (-l)»5V-l)<p(«)d2:[l 

QD 

+• 

Cg = (— 1)" 5 e* — 32)q)(e)<fe': [3 

— 00 
-f-QD 

c^ = (_1)* J (2*— 622 + 32)<p(2)(fo:[5^ 



— 00 



While the above development of an arbitrary 
frequency distribution ha-s reference to <p («) , or 
the relative frequency function, it is, however, 
equally well adapted to the representation of ab- 
solute frequencies as expressed by the function, 
Piz). If N is the total number of individual 
observations, or in other words the area of the 
frequency curve, we evidently have 

Fiz) = N(f(e) or J F(z)dz = N J (p(z)dz = N. 

— 00 — 00 

Since 2V is a constant quantity we may, there- 
fore, write the expansion of Fiz) as follows: 
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F{z) = N [co9o(2) + Ci<l>i(a) + C2(P2(2)+ . . .] = 

where the coefficients Ci have the value 

+ 00 

Ci = ^^X- J F{z)Hi(z)dz for i = 1, 2, 3, . . . 
and where 

+ 00 

N = \ F{z)dz. 

— 00 

Since all the Hermite functions are polynom- 
ials in z, it can be readily seen that the coeffi- 
cients c may be expressed as functions of the 
power sums or of the previously mentioned sym- 
metrical functions «, where 

+ 00 

Sr = \ ^F{z)dz. 

— 00 

These particular integrals originally introduced 
by Thiele in the development of the semi-in- 
variants have been called by Pearson the 
"moments'' of the frequency function, F{z), and 
Sr is called the r** moment of the variate z with 
respect to an arbitrary origin. 

It can be readily seen that the moment of 
order zero, or s^ is 
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*o == I t'*F{z)dz = N == N \ <f{t)dz. 

— GO — OD 

Henoe we have for the first coefficient Cq. 
Co = \ F(z)dz: \Fie)<k= 1. 

— 00 -|~Q0 

We are, however, in a position to further 
simplify the expression for F(z), 

As already mentioned we are at liberty to 
chooee arbitrarily both the origin and the unit 
of the Cartesian coordinate system for the fre- 
quency curve without changing the properties of 
this curve. Now by making a proper choice of 
the Cartesian system of reference we can make 
the coefficients c^ and c^ vanish. In order to ob- 
tain this object the origin of the system must be 
so chosen that 

Ci = ^ J zFiz)de : J F{z)dg = 0. 

This means that the semi invariant s^: 8^ = X^ 
must vanish. It can be readily seen that the above 
expression for X^, is nothing more than the usual 
form for the mean value of a series of variates. 
Moreover, we know that the algebraic sum (or 
in the case of continuous variates, the integral) 
of the variates around the mean value is always 



40 Frequency Curves. 

equal to zero. Henoe by writing for z the expres- 
sion {z — M) when M equals the mean value or 
\ we can always make c^ vanish. 

To attain our second object of making c^ 
vanish we must choose the unit of the coordinate 
system in such a way that the expression 



+ 00 +00 

_ (- 1)^ 

'2 



Co = ^-^ J F{z)H^{z)dz : 5 Fig)de = 



11 _. 



00 QO 



which implies that 



+ 00 +00 



I F{z)z*dz — \ Fiz)dz : J F{e)dz = 



I 00 — 00 



+ 0& 



— 00 



or that s^: $Q — 1 = 0, or when expressed in terms 
of the semi-invariants that 

A2 == (^2^0 ^1/ • ^0 ^^ ^' 

But by choosing the mean as the origin of the 
system the term Sj^ : Sq is equal to and we have 
therefore X^ = (S^ = s^: Sq =^ 1. Hence, by sdec- 
ting as the unit of our coordinate system j/Xg or 
Of, where of is technically known as the dispersion 
or standard deviation of the series of variates, we 
can make the second coefficient c^ vanish. 

In respect to the coefficients c^ and c^ we 
have now 
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^3 = 



(-1)' 

Li 



"4-00 + QD 

5 z*F{z)dz — 3 \ zF(t)dz 



I — 00 



— 00 



+» 
: \ F{t)dz 



J — 00 



S 'Si 

which reduces to ■' , while 

[3_ 



Ci = 



{-tf 
li 



■-f-oe 



+ » 



S^F{z)dz-&\z*P(^z)dz + 



' 30 



— OO 



+ 00 1 +00 

+ 3 \ F{z)dz : I F{z)dz 



00 



J — oe 



which reduces to 



L^o 



^0 ^0 



:|4 = 



L^o 



■■\± 



While the ooefficients of higher order may be 
determined with equal eaae, it will in general be 
found that the majority of moderately skew fre- 
quency distributions can be expressed by means 
of the first 4 parameters or coefficients. 



11. COEFFICIENTS We shall now show how the 
semF'?nvj5uants same results for the values of 

the ooefficientB may be ob- 
tained from the definition of the semi-invariants. 
Since we have proven that a frequency function, 
P{z), may be expressed by the series 
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we may from the definition of the semi-invariants 
write down the following identity: — 



SqC 






+ 00 



00 



where N is the area of the frequency curve. 

The general term on the right hand side of 
the equation will be of the form 

+ 00 
— 00 

where the integral may be evaluated by partial 
integration as follows : — 

+ 00 +3e +00 

— 00 — 00 — 00 

and where the first term on the right vanishes 
leaving 

+ 00 +00 



— 00 — 00 



Continuing in the same manner we obtain by 
successive integrations 
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— 00 — 00 

-f-oo -f-* 

(—09)2 J e'"'<p,_2(2)<fe = (—«>)» J e'°'<pr-3{<')de 



— 00 — 00 



from which we finally obtain the relation 

-}-00 +* 

5 e"°<Pr(2)<fe = (-co)' 5 e*°'<P(,(2)<fe 



1/2 



?$ 



e dz. 



— QD 



This latter integral may be written as 









1/2 



?'1 



— 00 



0)2 09* 



j/2^ '^ 

Consequently the relation between the semi-in- 
variants and the frequency function may be writ- 
ten as follows : — 
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X,C9 X.co* 



c .LI ' 12. __ 

N [cq qOD + CgCO^ CgCO^ 4- . . .] c ^ , 



or 



Xi© 09* ,, 

+ rT(^«-i)+--- 



SqB 



Li. ^ LI _ 

N [cq — qco + Cgco — Cgco^ + •••]• 



By successive differentiation with respect to (o 
and by equating the coefficients of equal powers 
of CO we get in a manner similar to that shown 
on page 13 the following results : — 



P — fo _ ^ _ 1 



Ci = — Xi 



C2 



1 [(X^_l) + Xj] 



Cs = -13- [Xs + 3 (X, — 1) Xj + X J] 

Ca = r|[X,+4X3Xi+3(X,-l)«+6(X2-l)Xj+Xt]. 

If we now again choose the origin at X^, or 
let Xj = 0, and choose j/Xg = 1 as the unit of our 
coordinate system we have : — 

Cq = 1, Ci = U, Cg = 0, Cq = I n Xg, C4 = rT \' 
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12. LINEAR TRANS' The theoretical development of 
FORMATION ^j^^ ^^^^ formulae explicitly 

assumes that the variate, z, is 
measured in terms of the dispersion or j/Xg (z) and 
with \i(z) as the origin of the coordinate system. 
In practice the observations or statistical data are, 
however, invariably expressed with reference to 
an arbitrarily chosen origin (in the majority of 
cases the natural zero of the number scale) and 
expressed in terms of standard units, such as 
centimeters, grams, years, integral numbers, etc. 
Let us denote the general variate in such ar- 
bitrarily selected systems of reference by x. Our 
problem then consists in transforming the various 

semi-invariants, \i{x), ^2(^)^ ^sC^)? ^4(^) 

to the z system of reference with X^ (z) 

as its origin and j/Xg (z) as its unit. Such a trans- 
formation may always be brought about by means 
of the linear subetitution 

z = ax + b 

which in a purely geometrical sense implies both 
a change of origin and unit. On page 16 we 
proved the following general properties of the 
semi-invariants 

\j^(z) = \(ax+b) = a\i{x)-^b 
\r(z) = \r{ax-{-b) = a^'Xrix). 
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Let us now write X^ (x) = M and X2 (x) = cJ*, 
we then have the following relationa : — 

\^(z) = aM + b 

Since the coordinate system of reference must 
be chosen in such a manner that Xi(z) =0 and 



\/h(^ 


;) = 1 we have:— 












aM + b = 


: 










acs — 1 


^ 




•mw 


from 


which we 


obtain a = 


1 

C5 


and b 


— M 


which brings z on the form 


: z = 


(x M) 


: (5 while 


<PoW 


becomes 












9o(^) 


1 

— -/ ^ 


(X-. 


• 





Moreover, we have X, (z) = X^ (x) : d*' for all 
values of r > 2. We are now able to epitomize 
the computations of the semi-invariants under the 
following simple rules. 

(1) Compute \i(x) in respect to an arbitrary 
origin. The numerical value of this parameter 
with opposite sign is the origin of the fre- 
quency curve. 

(2) Compute X^ (x) for all values of r > 2. The 
numerical values of those parameters divided 
with (J/Xg {x)% or a% for r = 2, 3, 4, . . . 
.... are the semi-invariants of the frequency 
curve. 
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u. CHABLiEKs The general formulae for the 

SCHEME OF • • • X ^ 

COMPUTATION semi-mvanants were given on 

page 13. In practical work 
it is, however, of importance to proceed along 
systematic lines and to furnish an automatic check 
for the correctness of the computations. Several 
systems facilitating such work have be«en proposed 
by various writers, but the most simple and 
elegant is probably the one proposed by M. Char- 
lier and which is shown in detail with the neces- 
sary control checks on the following page. Char- 
lier employs moments, while we in the following 
demonstration shall prefer the use of the semi- 
invariants. 

If we define the power sums of the relative 
frequencies cp (x) by the relation 

mr = 5 3fF{x)dx : J F{x)dx (r = 0, 1, 2, 3, . . .), 



■00 — oo 



we find that the expressions for the semi-invariants 
as given on page 13 may be written as fol- 
lows : — 

Xj = nil 

Xo = /Ho — M^ 

X3 = m^ — 3/no/Wi + 2/Wj 

X4 = /W4 — 4/W3/W1 — 3ml + 12/^2/^1 — 6m^ 



48 
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The advantage of the Charlier scheme for the 
computation of the semi-invariants lies in the fact 
that it furnishes an automatic check of the 
final results. If we expand the expression 
(x + 1)^ F(x) we have: — 



or 



x^F{x) + ^x^F{x) + 6x^F(x) + ^xF(x) + F{x) 



which serves as an independent control check of 
the computations. Moreover, another check is 
furnished by the relation 



2 



2 



m^ = X44- 4/^1X3 + 6/Wj Xg + 3X2 4-/WjL. 

In order to illustrate the scheme we choose the 
following age distribution of 1130 pensioned func- 
tionaries in a large American Public Utility cor- 
poration. 



Ages 


No. of Pensioners 


Ages 


No. of Pensioners 


35-39 


1 


65—69 


286 


40-44 


6 


70—74 


248 


45-49 


17 


75—79 


128 


50-54 


48 


80-^4 


38 


55—59 


118 


85—89 


13 


60—64 


224 


over 90 


3 



The complete calculations of the coefficients c 
are shown in the appended scheme by Charlier. 
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The above computations give the numerical 
values of the frequency function which now may 
may be written as follows : 

F(x) = 1130 [(cpo(^) + .0258cp3(a:).0168(p4(a;)] 
where _ _i^ /x+mwy 

1 2 \ 1.8240 / 

^»<^> = 1.624 1/2^' 



"• BETWEEN OB- ^he ncxt step is now to work 
^^%^Bo^^icA^^ ^^* *^® numerical values of 
VALUES F(x) tor various values of x 

and compare such values with the ones originally 
observed. This process is shown in detail in the 
following scheme. 

Colunm (1) givee the values of the variate x 
reckoned from the provisional origin, or the centre 
of the age interval 65-69. (2) is x less the first 
semi-invariant, whereby the origin is shifted to 
the mean or X. Column (3) represents the final 
linear transformation : z = {x — Xj) : a. 

Columns (4), (5) and (6) are copied directly 
from the standard tables of J0rgensen or Charlier. 
Column (7) is (5) multiplied by 0.0258 or the 
product — [c393(2)J:|3_, while (8) is [^494(2)]: [4. 

Column (9) is the sum of (4), (7). and (8). 
If we now distribute the area N = 8q or 1130 pro 

4* 
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rata according to (9) , we finally reach the theore- 
tical frequency distribution expressed in 5-year 
age intervals and shown in column (10) alongside 
which we have inserted the originally observed 
values. Evidently the fit is satisfactory. It will 
be noted that the final frequency series is expres- 
sed in units of 5-year age intervals. This, how- 
ever, is only a formal representation. By sub- 
dividing the unit intervals of column (1) in 5 
equal parts, and by computing all the other 
columns accordingly, we get the theoretical fre- 
quency series expressed in single year age inter- 
vals. 

16. THE PRINCIPLE The f oUowing paragraph pur- 

OF METHOD OF ^ . . • u • * 'i-- 

LEAST SQUARES ports to givc a brief exposition 

of the determination of the co- 
efficients in the Gram or Laplacean — Charlier 
series in the sense of the method of least squares 
as a strict problem of maxima and minima, wholly 
independent of the connection between the method 
of least squares and the error laws of precision 
measurements. ^ 

The simple problem in mjaxima and minima 
which forms the fundamental basis of the method 



^ In the following demonstration I am adhering to 
the brief and lucid exposition of the Argentinean actuary, 
U. Broggi, in his exeWeni • T raite cT Assurances sur la Vie, 
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of least squares is the following : Let m unknown 
quantities be determined by observations in such 
a manner that they are not observed directly but 
enter into certain known functional relations, 
fdXjy x^, x^, . , . , Xm) , containing the unknown 
independent variables, x^^y x^, x^y . . , Xm. Let 
furthermore the number of observations on such 
functional relations be n (where n is greater than 
m). The problem is then to determine the most 
plausible system of the values of the unknowns 
from the observed system. 

/I \*^l ? *^2 ? *^By • • • '*'»») ^^ ^1 



Jn \Xi / X2 J X^ , . • . Xfn) — On 

when /i, /a, . • • /« are the known functional 
relations and o^^y o^y ^ . . On their observed values. 
Such equations are known as observation equor 
tions. 

In order to further simplify our problem we 
shall also assume that 

1 All the equations of the system have the 
same weight, and 

2 All the equations are reduced to linear form. 
By these assumptions the problem is reduced 

to find m unknowns from n linear equations. 
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^2 ^1 + ^2 ^2 + • • • =f ^2 



OnXi + bnX2 + - - . = On 

Since n is greater than m we find the problem 
over-determined, and we therefore seek to deter- 
mine fthe unknown quantites, x^, x^y . . . x^ in 
such a way that the sum of the squares of the 
differences between the functional relations and 
the observed values, o becomes a minimum. This 
implies that the expression 

^{OiX^+biO^-h . . . —Oi)^ = ^(Xi, X^, . . .Xnd 

t = l 

must be a minunum or the simultaneous existence 
of the equations. 

If we DOW introduce the following notation 

aiXi+ biX2+ ... — 0{ = Xi for i = 1, 2, 3, . . . w, 

the m equations in the above system (I) evidently 
take on the following form 
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Xj ftj 4- X2 ^2 + • • • '^^nbn = 



If we now again re-substitute the expressions 
for X in terms of the linear relations 

aiXi + biX2+ . . . Oi = Xiy for i = 1, 2, 3, . . . n, 

and collect the coefficients of x^, o^g, . . . a;^, these 
equations may be expressed in the following synj- 
bolical form : 

[aa]Xj^ + [flb]aJ2 + .... — [ao'] = 
[ab]Xi + lbb]x^ + .... — [60] = 



[ak'jxj^ 4- [bfejajg + .... 4- [fefc]x» — [fco]=0 

where [aa] = a^^ + ag* 4- . . . . 
[ab] = a^ bj + ttj 63 4- . . . . 

is the Gaussian notation for the homogeneous sum 
products. 

The above equations are known as normal 
eqttations, and it is readily seen that there is one 
normal equation corresponding to each unknown. 
Our problem is therefore reduced to the solution 
of a system of simultaneous linear equations of m 



Normal Equations. 57 

unknowns. If m is a small number, or, what 
amounts to the same thing, there are only two or 
three unknowns the solution can be carried on 
by simple algebraic methods or determinants. If 
the number of unknowns is large these methods 
become very laborious and impractical. It is one 
of the achievements of the great German mathe- 
matician, Gauss, to have given us a method of 
solution which reduces this labor to a minimum 
and which proceeds along well defined systematic 
and practical lines. The method is known as the 
Gaussian algorithmus of successive elimination. 



16. GAUSS* soLU' For the sake of simplicity we 
^^%UATioNs'^^ shall limit ourselves to a sy- 

stem of four normal equations 
of the form 

[abjaji + [b6]x2 + [bc'jx^ + [bd^x^ — [bo] = 
[ac']Xi + [be] 0^2 + [cc]cc3 + [cd]^;^ — [co'] = 
[ad]Xi + [brf]^2 + [e^]^3 + [ddrjx^^ — [rfo] = 

The generalization to an arbitrary number of 
unknowns offers no difficulties, however. 

On account of their symmetrical form the 
above equations may also be written in the more 
convenient form, viz. : 
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[aa]a;i + [abjajj + [^^]^3 + [cidlx^^ — [ao'] = 

[bb]aj2 4- [bc'jx^ + [bdjx^ — [bo] = 

[ccjajg 4- [^cdjx^ — [co] = 

[dd]x^ — [do] = 

From the first equation we find 

_ [ao] [ah] [ac] [ad] 

^^ ~ [aa] [aa] ^ [aa] ^ [aa] ^^' 

Substituting this value in the following equa- 
tions and by the introduction of the new symbol 

[^A]-[^[aA] = \ik.l] 

we now obtain a new system of equations of a 
lower order and of the form 

[bb.ljx^ + [bcl^x^ + [bd.rix^ — [bo.l] = 

[cc. IJajg + [cd.ljcc^ — [co.l] = 

[dd.l>4— [do.l] = 

Solving for x^ we have 

[bo.l] [bcl] [bd.l] 

' ~ [bb.l] [bb.l] " [bb.l] *' 

Substituting in the following equations and 
writing 
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we have 

[cc.2]a;3 + [cd.2]a;4 = [co.2] 
[dd.2]x^ = [do.2] 

OP 

_ [co.2] [cd.2] 
^ " [cc.2] [cc.2] ^^' 

Moreover, by writing 

\ik.2] = [ci.2] [J| = \ik.S\ , 

we have finally 

[dd.S\x^ = [do.S\ 

This gives us the final reduced normal equa- 
tion of the lowest order. By successive substitu- 
tion we therefore have: 

__ [do.3] 
* ~ [dd.S] 

_ [co.2] [cd.2] 
^ ~ [cc.2] [cc.2] ^* 

[fco.l] [ftc. 1] [M.1] 
^ ~ [W.l] [M.l] [W.l] 

_ [ao]_[?*]„ _[?£l^ _Mr 
^ ~ [aa] [aa] ^ [aa] ^ [aa] * 

as the ultimate solution of the unknowns. 
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17. ARITHMETICAL The example in paragraph 13 
"^^METHOD^ ^^ gave an illustration of the ap- 
plication of the method of mo- 
ments. As previously stated this method works 
quite well in cases of moderate skewness, but is 
less successful in extremely skew curves and where 
the excess is large. We shall now give an illustra- 
tion of the calculation of the parameters by the 
method of least squares. The example we choose 
is the well-known statistical series by the disting- 
uished Dutch botanist, de Vries, on the number 
of petal flowers in Ranunculus Bulbosu^, This 
is also one of the classical examples of Karl Pearson 
in his celebrated original memoirs on skew variar 
tion. Although the observations of de Vries lend 
themselves more readily to the method of logarith- 
mic transformation, which we shall discuss in a 
following chapter, we have deliberately chosen to 
use it here for two specific reasons. Firstly it is 
a most striking illustration in refutation of the 
immature criticism of the Gram-Charlier series 
by a certain young and very incautious American 
actuary, Mr. M. Davis, who has gone on record 
with the positive statement, "that the Charlier 
series fails completely in case of appreciable skew- 
ness". Secondly (and this is the more important 
reason) it offers an excellent drill for the student 
in the practical applications of the method of least 



Numerical Application. 61 

squares because it gives in a very brief cx)mpass 
all the essential arithmetical details. The observa- 
tions of de Vries are as follows : 



No. of petals 


X 


F{^) - 0^ 


5 





133 


6 


1 


55 


7 


2 


23 


8 


3 


7 


9 


4 


2 


10 


5 


2 



where F{x) denotes the absolute frequencies. The 
observed frequency distribution is well nigh as 
skew as it can be and represents in fact a one- 
sided curve, and should therefore — ^if the state- 
ment by Mr. Davis is correct — show an absolute 
defiance to a graduation by the Gram-Charlier 
series. 

The process we shall use in the attempted 
mathematical representation of the above series is 
a combination of the method of semi-invariants 
and the method of least squares. Following 
Thiele's advice we determine the first two semi- 
invariants in the generating function directly from 
the observations while the coefficients of this 
function and its derivations are determined by 
the least square method. 

Choosing the provisional origin at 6, we obtain 
the following values for the crude moments. 



62 Frequency Curves. 

s^ = 222, s^ = 140, s^ = 292, ^3 = 806, 8^ = 2,752, 
s, = 10,790, ^6 = 46,072, s, = 207,226, 

from which we find that 

X^ = 1, Xi = 0.631, X2 = 0.917, X3 = 1.644, 
X4 = 3.377, Xg = 5.972, X^ = —2.911, 

X7 = 122.638. 

. All these semi-invariants with the exception 
of the two' first are, however, so greatly influenced 
by random sampling in the small observation 
series that it is hopeless to use them in the deter- 
mination of the constants in the Gram-Charlier 
series. In fact an actual calculation does not give 
a very good result beyond that of a first rough 
approximation. The generating function, on the 
other hand, may be expressed by the aid of the 
two first semi-invariants as follows : 

1 — 2»:2 

where z is given by the linear transformation : . 
z = (x— 0.631) : 0.9676. (/X^ = 0.9576). 

We now propose to express the observed func- 
tion F(x) or (p(z) by a Gram-Charlier series of 
the form : 
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F{x) = cp(z) = fto(Po(2) + A^jcp3(z)4-A;4(p4(z). 

In this equation we know the values of the 
generating function and its derivatives for various 
values of the variate z as found in the tables of 
J0rgenflen and Charlier, while the quantities ft are 
unknowns. On the other hand we know 6 specific 
values of F{x) as directly observed in de Vries*s 
observation series. We are thus dealing with a 
system of typical linear observation equations of 
the forms described in paragraphs 15 and 16 
and which lend themselves so admirably to the 
treatment by the method of least squares. 

From the above linear relation between x and 
z we can directly compute the following table for 
the transformed variate z. 



X 


z 





—0.688 


1 


+0.402 


2 


+ 1.493 


3 


+2.583 


4 


+3.674 


5 


+ 4.764. 



The numerical values of cpo(^) s^nd its derivat- 
ives as corresponding to the above values of z can 
be taken directly from the standard tables of J0r- 
gensen and Charlier. We may therefore write 
down the following observation equations : 
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9o 


?8 


94 





.3148fc, 


—.5472^3 


+ .1207fe^ 


133 = 


.3679^0 


+ .4198^3 


+ .7566fc^ 


55-0 


.1308fc, 


+ .1506^3 


.7073fc^ 


— 23 = 


.0145fe, 


—.1346^3 


+ .1062fc^ 


7 = 


.0005fco 


— .0180fc3 


+ .0486fe^ 


— 2 = 


.OOOlfco 


.0005^3 


+ .0020fe^ 


— 2 = 



for which we now propose to determine the un- 
known values of fe by the least square method. 

While this method may of course be applied 
directly to the above data, it will generally be 
found of advantage to start with some approximate 
values of the fe's. It is found in practice that 
this approximate step saves considerable labour 
in the formation and ultimate solution of the 
normal equations. 

Although the first approximation in the case 
of numerous unknowns must be in the nature of 
a more or less shrewd guess, which facility can 
only be attained by constant practice in routine 
mathematical computing, we are, however, in this 
specific instance able to tell something about the 
nature o fthe coefficients from purely a priori con- 
siderations. We know for instance from the form 
of the Gram-Charlier series that the coefficient k© 
of the generating function must be nearly equal 
to the area of the curve, which in this particular 
instance is 222. Moreover, a niere glance at the 
observed series tells us that it has a decidedly 
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large skewness ini negative direction from the 
mean coupled with a tendency of being "top 
heavy", indicating positive excess. We can there- 
fore assume as a first approximation that the 
coefficients of the derivatives of uneven order are 
negative and the coefficients of derivatives of even 
order are positive. 

From such purely common sense a priori con- 
siderations we therefore guess the following first 
approximations, viz. : 

kl = 222, K = — 25, kl = 30. 

The probable values of the various fe's may be 
written as 

ki = Tiki for i = 0, 3, 4, 

and our problem is therefore to find the correction 
factor r with which the approxiniate value k} 
must be multiplied so as to give h. 

Applying the various values of ki to the 
original observation equations on page 64 we obtain 
the following schedule for the numerical factors 
of r<. 



a 


b 


c 





s 


69.9 


4-13.7 


4- 3.6 


133.0 


45.8 


81.7 


—10.5 


22.7 


55.0 


4-38.9 


29.1 


3.8 


—21.2 


23.0 


—18.9 


3.3 


+ 3.4 


4- 3.2 


7.0 


4- 2.9 


0.1 


4- 0.5 


4- 1.5 


2.0 


+ 0.1 


0.0 


4- 0.0 


4- 0.0 


2.0 


2.0 


184.1 


4- 3.3 


4- 9.8 


222.0 


24.8 
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where the additional control column 8 serves as a 
check. 

The subsequent formation of the various sum- 
products and normal equations is shown in the 
following schedules together with the s columns 
as a check. 



aa 


ab 


ac 


ao 


as 


+ 4,886 


+ 958 


4- 252 


9,297 


-^201 


+ 6,675 


^58 


+ 1,855 


— 4,494 


+ 3178 


+ 847 


111 


617 


669 


550 


+ 11 


+ 11 


+ 11 


- 23 


+ 10 


+ 


+ 


+ 





+ 


+ 


+ 
+ 


+ 

+'1,501 





+ 


+ 12,419 


14,483 


563 




bb 


be 


bo 


bs 




+ 188 


+ 49 


1,822 


628 




+ 110 


238 


578 


408 




+ 14 


+ 81 


+ 87 


+ 72 




+ 12 


+ 11 


24 


+ 10 




+ 


+ 


1 


+ 




+ 


+ 
96 





+ 




+324 


1,182 


— 954 






cc 


CO 


cs 






+ 13 


479 


165 






+ 515 


1,249 


+ 883 






+ 449 


+ 488 


+ 401 






+ 10 


22 


+ 9 






+ 2 


3 


+ 1 




. 


+ 


+ 


+ 



+ 989 — 1,265 +1129 
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We may now write the normal equations in 
schedule form as follows : 

ORIGINAL NORMAL EQUATIONS 

(a) +12,419 + + 1501 — 14483 

(1) +0+0—0 
(6) + 324 — 96 — 1182 

(2) + 181 — 1750 
(c) + 989 — 1265 

W +.00000 +. 12^86 —1^16617 

The sum-products from the observation equa- 
tions are shown in the rows marked (a) , (b) , (c) . 
The row marked (3) and printed in italics is 
formed by dividing each of the figures in row (a) 
with 12,419. The row marked (1) contains the 
products of the figures in row (a) multiplied with 
the factor .00000. All these products happen in 
this case to be equal to zero. Eow (2) is the 
products of the factor 0.12086 and the figures in 
row (a). 

We next subtract row (1) from row (b), row 

(2) from row (c) , which results in the following 

schedule, which is known as the first reduction 

equation. 

FIRST REDUCTION EQUATIONS 

(a) +324 — 96 — 1182 
(1) +28+350 

(6) +808+485 

]2) -^.29626 —3764814 
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The above equations are treated in a similar 
manner as the original normal equations, and we 
have therefore the 2nd reduction equation of the 
form : 

SECOND REDUCTION EQUATION 
+780 +135 

The solution for the unknown r's may now 
be shown as follows : 

U = — 135 : 780 = —.17306 
r, = 3.64814— (—.29626) (—.17309) = 3.59637 
fo = 1.16617— (0.0) 3.59637) — (.12086) 

(—.17308) = 1.18709. 

From which we find : — 

fco = 263.5, feg = —89.9, fe^ = —5.1 

Applying these factors to the values of %(z) , 
%(z) and ^4^(z) we obtain the following re- 
sult :— ^ 

*n9o ^3 98 ^4 94 2^i?< Obs. 

82.9 +49.2 —0.6 131.5 133 



96.9 


—37.7 


—3.9 


55.3 


55 


34.5 


13.5 


+3.6 


24.6 


23 


3.8 


+ 12.1 


-0.5 


15.4 


7 


0.1 


+ 1.0 


—0.2 


0.9 


2 


0.0 


+ 0.0 


—0.0 


0.0 


2 



^ For a closer approximation see my Mathematical 
Theory of Probabilities (Second Edition, New York, 1921). 
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IS. TRANSFORMA' While It is always possible to 
^^vAmATE^ express all frequency curves by 

an expansion in Hermite poly- 
nomials, the numerical labor when carried on by 
the method of least squares often involves a large 
amount of arithmetical work if we wish to retain 
more than four or five terms of the series. Other 
methods lessening the arithmetical work and ma- 
king the actual calculations comparatively simple 
have been offered by several authors and iiotably 
by Thiele, who in his works discusses several 
such methods. Among those we may mention the 
method of the so-called free functions and ortho- 
gonal substitution, the method of correlates and 
the adjustment by elements. The chapters on 
these methods in Thiele 's work are among some 
of the moet important, but also some of the 
nK)st difficult in the whole theory of observations 
and have not always been understood and appre- 
ciated by the mathematicians, chiefly on account 
of Thiele's peculiar style of writing. A close study 
of the Danish scholar's investigations is, how- 
ever, well worth while, and Thiele 's work along 
these lines may still in the future become as 
epochmaking in the theory of probability as some 
of the researches of the great Laplace. The 
theory of infinite determinants as used by M. 
Fredholm in the solution of integral equations is 
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another powerful tool which offers great advant- 
ages in the way of rapid calculation. All these 
methods require, however, that the student must 
be thoroughly familiar with the difficult theory 
upon which such methods rest, and they have 
for this reason been omitted in an elementary 
work such as the present treatise. 

We wish, however, to mention another method 
which in the majority of cases will make it pos- 
sible to employ the Gram or Laplacean — Oharlier 
curves in cases with extreme skewness or excess. 
We have here reference to the method of logarith- 
mic transformation of the variate, x. 



19, THE GENERAL One of the simplest trans- 
TR^SFomiATioN formations is the previously 

mentioned linear transforma- 
tion of the form z = fix) = ooj + b, by which 
we can make two constants, c^ and c^ vanish. 
Other transformations suggest themselves, how- 
ever, such as fix) = ax^ + bx + c, fix) = \/x, 
fix) = logx and so forth. For this reason I pro- 
pose to give a brief development of the general 
method of transformations of the statistical 
variates, mainly following the methods of Oharlier 
and J0rgensen. 

Stated in its most general form our problem 
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is: If a frequency curve of a certain variate is 
given by F(x) what will be the frequency curve 
of a certain function of x, say fix) ? 

The equation of the frequency curve is t/ = 
F(x) , which means that F(x)dx is the probability 
that x toils in the interval between x — ^dx and 
X + ^dx. The probability that a new variate z 
after the transformation z = f(x), or x(^) = ^i 
falls in the interval z — ^dz and z + ^dz is there- 
fore simply 

F[x(z)UHz)dz = F{x)dx, 

which gives in symbolic form the equation of the 
transformed frequency curve. 

The frequency f or ;2; = t(x) is of course the 
same a« for x. The ordinates of the frequency 
curve, or rather the areas between corresponding 
ordinates, are therefore not changed, but the ab- 
cissa axis is replaced by f(x). Equidistant inter- 
vals of X will therefore not as a rule — except in 
the lineai^ transformation — correspond to equid- 
istant intervals of f(x). 

If, for instance, the frequency curve F(x) is 
the Laplaoean normal curve 



1 — a^:2(5« 

a\/2n 
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and if we let z = f{x) = a?^ or oj = ^z, we have 
evidently „ . 

a]/2n 2)/z 



to. LOGARITHMIC Of the various transformations 
TBANSFORMATioN ^^e logarithmic is of special 

importance. It happens that 
even if the variate x forms an extremely skew 
frequency distribution its logarithms will be 
nearly normally distributed. 

This fact was already noted by the eminent 
German psychologist, Fechner, and also men- 
tioned by Bruhns in his Kollektivmasslehre. But 
neither Fechner nor Bruhns have given a satis- 
factory theoretical explanation of the transforma- 
tion and have limited themselves to use it as a 
practical rule of thumb. 

Thiele discusses the method under his adjust- 
ment by elements, but in a rather brief manner. 
The first satisfactory theory of logarithmic trans- 
formation seems to have been given first by J0r- 
gensen and later on by Wicksell.^) J0rgensen 



^ The law of errors, leading to the geometric mean 
as the most probable value of the variate as discovered 
by Prof. Dr. Th. N. Thiele in 1867 may, however, be con- 
sidered as a forerunner of J0rgensen's work. 
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first begins with the transformation of the normal 
Laplacean frequency curve. Letting z = loqx and 
bearing in mind that the frequency of x equals 
that of loqx we have 

z = /(x) = log x, or re = \(i) = e* and dx = e^dz. 

The continuous power sums or moments of 
the rth order around the lower limit take on 
the form' 

» J_ / logos — m \* 



— V --( 






00 



on the assumption the logx is normally distrib- 
uted. 

The change in the lower limit in the second 
integral from — 00 to zero arises simply from the 
fact that the logarithm of zero equals minus in- 
finity and the point — 00 is thus by the trans- 
formation moved up to zero. 

By a straightforward transformation we may 
write the above integral as 
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+ 00 



N m(r+l) + Vtn»(i'-fl)» f —W*dt 



Mr = -t= e V 

l/27r J 



— 00 



__ »n(f + l) + Vtn»(f + l)« 

= iV6 



Changing from moments to semi-variants by 
means of the well-known relations 

Xj = M^:Mq 

Xg = (M^M^ — MOiMl 

X3 = (Mail/?— 3ilf2MiMo + 2ilf?):Jf? 

X4 = (M^m— ^M^M^m — iMlMl + 
+ 12ilf2il/|Mo — 6ilf}):JI/J 



we have 



m + V« «* 



\ OT-f 1.5n* 
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These equations give the semi-invariants ex- 
pressed in terms of m and n. On the other hand 
if we know the semi-invariants from statistical 
data or are able to determine theee semi-invariants 
by a priori reasoning we may find the parameters 
m and n. 

21. THE MATHEMA' A point which we must bear 
TJCAL ZERO ^^ mind is that the above semi- 
invariants on account of the 
transformation are calculated around a zero point 
which corresponds to a fixed lower limit of the 
observations. 

Very often the observations themselves in- 
dicate such a lower limit beyond which the fre- 
quencies of the variate vanish. In the case of 
persons engaged in factory work there is in most 
countries a well-defined legal age limit below 
which it is illegal to employ persons for work. 
Another example is offered in the number of 
alpha particles radiated from certain radioactive 
metals. Since the . number of particles radiated 
in a certain interval of time must either be zero 
or a whole positive number it is evident that — 1 
must be the lower limit because we can have no 
negative radiations. Analogous limits exist in the 
age limit for divorces and in the amount of 
moneys assessed in the way of income tax. 
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The lower limit allows, however, of a more 
exact mathematical determination by means of 
the following simple considerations. It is evident 
that this lower limit mu«t fall below the mean 
value of the frequency curve. Let us suppose that 
it ie located at a point, a, located say r\ units in 
negative direction from the mean, M = X^ , and 
let us to begin with select X^ as the origin of the 
coordinate system in which case the first semi- 
invariant, Xj, is equal to zero. Transferring the 
origin to a the first semi-invariant equals r\ , while 
the semi-invariants of higher order remain the 
same as before the transformation and we have : 



X 



m-fl.5n* 



2 



^8 



a = r\ = e 

ri2(6""— 1) or e""' = l + Xain' 



,;;i+i)'-3(*i+ii+2 



2 ,,2 



_ n3 



V(' 



A2 OA2 



which reduces to X^rf — 3X5n 



X? 



0. 



The solution of this cubic equation which has 
one real and two imaginary • roots gives us the 
value of n or X^ — a and thus determines the 
mathematical zero or lower limit. We have in 

fact: ^2 _ loga + Xgin^) and 

m = log r\ — l.Sn^, while 



iV = Xgie 



TO-hV»n« 
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88. LOGARITHMIC' We have already shown that 

ALLY TRANS' 

FORMED FRE- the generalized frequency curve 

QUENCY SERIES ,^, .^ ^ -^ 

could be written as 



1 



rir ^ / V Cl9l(^) CoCPoC^) Ca<Ps(^) 

^(a^) = co(Po(^) — ^Yi^+ 2! sn^ ^ 

where the Laplacean probability function 

1 2<J« 

is the generating function with M and c5 as its 
parameters. 

The suggestion now inamediately arises to use 

an analogous series in the case of the logarithmic 

transformation. In this case the frequency curve, 

Fix), with a lower limit would be expressed as 

follows : 

while the generating function now is 

_ 1 riogg— ml* 

ny 271 
where m and n are the parameters. 
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Using the usual definition of semi-invariants 
we then have 

Xj09 XoOg ^3033 

-ji-+ 2r+~3r'^*\_ 5i«> ^2^^ ^8^' 

f xojn ^ / X Al*l(^) A2^2(^) 

kz%(x) 



3! 



"l • • • UjUm 



The general term on the right hand side in- 
tegral is of the form 

{—iyks:sl]e'^<^s{x)dx 



where the integral may be evaluted by partial 
integration as follows : 

]e'^<t>s(x)dx = e'^^^i(x)] — io]e'^^s-i{x)dx. 



o 



Since both ^(x) and all its derivatives are 
supposed to vanish for a; = and x = 00 the first 
term to the right becomes zero and 



00 oo 



5 e'^^.ix)dx = — CO 5 e'^^,-i(x)dx. 

O 

By successive integrations we then obtain the 
following recursion formula 
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(— to)i 5 e«°0._i(x)da; = (— to)* J e'^^^i(x)dx 



O 



(— (0(2 le'^^.-2{x)dx = (— co)» 5 e*°'<l>,_8(x)da; 



O 



(— (o)'-^ I e*°'<l)i(a;)da: = (— (o)'] e'^%ix)dx. 

O 

Or finally 



Expanding e^^ in a power series we have 



oo 



n\/2n ) 



|e'"0.(x)da; = 



3^(o^ a:*co* 

l + xto4 1 1-.. 

21 3! 



_ _1_ r iog x—m l* 
~ T L n J (to. 



The general term in this expansion is of the 
form 

f— COV CO'' (• ~ 2 L n J , 

— \ofe ax 



( — CO)* CO'' C 

n]/~27i rl J 
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which according to the formulas given on page 
74 reduces to : 

(_co)'e"'<'+^>^''-"<'+^'"to':r! 
Hence we may write 



f = OO 

00 



]e^^.(x)dx = (-co)' y'e"<*'+>'+^/'"'^'^'^co^r! 



« ito 



Consequently the relation between the semi- 
invariants and the frequency function 

F(x) = ko%(x) — ^ Oi(ic)+ gy <l>2(a;) — ^^s(^)+ . . . 

can be expressed by the following recursion for- 
mula 

_-n- + -^ + ^r+-_ ^ ^ ^ _ 

aqc — ^112!3! — 

t>=«0 »=00 f=0O 



i;=0 » = f = 



The constants k are here expressed in terms of 
the unadjusted moments or power sums, s. It is 
readily seen that the Sheppard corrections for 
adjusted moments, M, also apply in this case. 
We are, therefore, able to write down the values 
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of the k'a from the above recursion formula in the 
following manner 



M^ = A,e"+'"»'+2Aie''"+'^" +A„e*'"+*'"' 

•1-ACq6 

It is easy to see that it is not possible to 
determine the generating function's parameters m 
and n from the observations . These parameters 
like M and (5 in the case of the Laplacean normal 
probability curve must be chosen arbitrarily. If 
m and n are selected so as to make /c^ and fe, 
vanish we have 

M, = V"+'""' 

M^ = A^e'-^'"' 



M^ = k^e 



3m+4.6n' 



the solution of which gives 

„, _ MqM^ 2» M]_ MIM^ 

^ ~ M\ ' ^ ~ M%MV '^ ~ m 

while 
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This theory requires the computation of a set 
of tables of the generating function 

^ / X 1 ~ 2L n J 

ny2n 

and its derivatives. For ^q(x) itself we may' of 
course use the ordinary tables for the normal 
curve %{z) when we consider 

log X — m 
n 

I have calculated a set of tables of thfe deriv- 
atives of ^o(^) 8»nd hope to be able to publish the 
manuscript thereof in the second volume of my 
treatise on "The Mathematical Theory of Prohab- 
ilities'* . 

23. PARAMETERS The abovc development is 
'j^&t^qiFares based upon the theory of func- 
tions and the theory of definite 
integrals. We shall now see how the same pro- 
blem may be attacked by the method of least 
squares after we have determined by the usual 
method of moments the values of m and n in the 
generating function (Pq(z), 
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Viewed from this point of vantage our problem 
may be stated as follows : 

Given an arbitrary frequency distribution, of 
the variate z with z = (logo? — m) : n and where 
X is reckoned from a zero point or origin, which 
is situated a units below the mean and defined by 
the relation 

ri^Xg — Sn^Xl = ^2, where a = \ — n; 

to develop F{z) into a frequency series of the 
form 

where the fc's must be determined in such a wav 
that the expression 

i = n 

givee the best approximation to jP(;2;) in the sense 
of the method of least squares. 

Stated in this form the frequency function is 
reduced to the ordinary series of Gram or the A 
type of the Charlier series, already treated in the 
earlier chapters. 
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24. APPLICATION As an illustration of the theory 

OF A MORTALITY to a practical problem we pre- 

TABLE ^^^ ^j^^ following frequency 

distribution by 5-year age intervals of the number 
of deaths (or Z^dxhy quinquennial grouping) in 
the recently published American-Canadian Mor- 
tality of Healthy Males, based on a radix of 
100,000 entrants at age 15. 

Frequency Distribution of Deaths by Attained 
Ages in American-Canadian Mortality Table. 



Ages 


Ldx 


1st Component 


2d Comp. 


15— 19 


1,801 


120 


1,681 


20—24 


1,996 


230 


1,766 


25— 29 


2,089 


440 


1,649 


30— 34 


2,120 


790 


1,330 


35— 39 


2,341 


1,370 


971 


40— 44 


2,911 


2,270 


641 


45 49 


3,937 


3,570 


367 


50- 54 


5,527 


5,400 


127 


55— 59 


7,723 


7,722 


1 


50— 64 


10,383 


10,383 




65—69 


12,987 


12,987 




70- 74 


14,535 


14,535 




75— 79 


13,807 


13,807 




80- 84 


10,328 


10,328 




85— 89 


5,464 


5,464 




90—94 


1,757 


1,757 




95— 99 


278 


278 




100—104 


16 


16 





100,000 91,467 8,533 
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The curve represented by the d* column is 
evidently a compoeite frequency function com- 
pounded of several series. From a purely mathe- 
matical point of view the compound curve may 
be considered as being generated in an infinite 
number of ways a« the sunmiation of separate 
component frequency curves. From the point of 
view of a practical graduation it is, however, easy 
to break this compound death curve up into two 
separate components. A mere glance at the dg 
curve itself suggests a major skew frequency curve 
with a maximum point somewhere in the age 
interval from 70 — 75 and minor curve (practically 
one-sided) for the younger ages. 

Let us therefore break the ^dx column up into 
the two so far perfectly arbitrary parts as shown 
in the above table and then try to fit those two 
distributions to logarithmically transformed A 
curves. 

Starting with the first component the straight- 
forward computation of the semi-invariants is 
given in the table below with the provisional mean 
chosen at age 67. 
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Frequency Distribution of Deaths in^ American 
Mortality Table First Component. 

Ages X F{x) xF{x) ^F{x) ^F{x) 

104>-100 — 7 16 112 784 5,488 

99— 95—6 278 1,668 10,008 60,048 

94^ 90 — 5 1,757 8,785 43,925 219,625 

89— 85 — 4 5,464 21,856 87,424 349,696 

84— 80 — 3 10,328 30,984 92,952 278,856 

7^-75 — 2 13,807 27,614 55,228 110,456 

.7.4—70 —1 14,535 14,535 14,535 14,535 

69— 65 — 12,987 







59,172 


106,554 


304,856 


1,038,704 


64— 60 


+ 1 


10,383 


10,383 


10,383 


10,383 


59— 55 


+ 2 


7,723 


15,446 


30,892 


61,784 


54—50 


+ 3 


5,400 


16,200 


48,600 


145,800 


49— 45 


+ 4 


3,570 


14,280 


57,120 


228,480 


44— 40 


+ 5 


2,270 


11,350 


56,750 


283,750 


39- 35 


+ 6 


1,370 


8,220 


49,320 


295,920 


34— 30 


+ 7 


790 


5,530 


38,710 


270,970 


29—25 


+ 8 


440 


3,520 


28,160 


225,280 


24— 20 


+ 9 


230 


2,070 


18,630 


167,670 


19— 15 


+ 10 


120 


1,200 


12,000 


120,000 


t 


32,296 


88,199 


350,565 


1,810,037 



8r 91,468 —17,355 -665,421 771,333 

Computing the semi-invariants by means of 
the uisual formulas in paragraph 13, we have : 

Xi = —17355:91468 = —0.18974, or mean at 
age 67 + 5 (0.19) or at age 67.95 
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Xg = 655421 : 91468 — X^" = 7.1296 

X3 = 771333:91468 — 3X1 ma + 2X1' = 12.4981. 

In .order to determine the mathematical zero 
or the origin we have to solve the following cubic : 

X^n^ — SX^W^Xi^ or 
12.498 n^— 152.511 n' = 362.47 

the positive root of which is equal to 12.39. The 
zero point is therefore found to be situated 12.39 
5-year units from the mean or at age 67.95 + 5 
(12.39), i. e. very nearly at age 130, which we 
henceforth shall select as the origin of the co- 
ordinate system of the first component. We have 
furthermore 

12.39 =e"»+i-5'**, and 7.1296 = e^'^+^'^'i^'—l) = 

= (12.39)^ (e«'—l), 

the solution of which gives n^ = 0.04436, n = 
0.2106, m = 2.4504, all on the basis of a 5-year 
interval as unit. If we wish to change to a single 
calendar year unit we must add the natural 
logarithm of 5, or 1.6094, to the above value of m, 
which gives us m = 4.0598, while n remains the 
same. The above computations furnish us with 
the necessary material for the logarithmic trans- 
formation of the variate x which now may be 
written as 
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z = [log (130 — a;) —4.0598] : 0.2106, 

where x is the original variate or the age at death. 
Having thus accomplished the logarithmic 
transformation we may henceforth write the 
generating function as 

_ J^ r log (180 — x) — 4.0598 1' 
2 L 0.2106 J 



^o(^) = 7=-e 

.21061/2:1 



1 — «»:2 



= ^•^^^ = 171^^ 

We express now F (x) by the following 
equation. 

F(x) = ko^oioo) + ks<^s(^) + h^4.{^) + 

or in terms of the transformed z : 

cp(2) = ko(po(z) + h^s(z) + k^^>^(z) + , 

and proceed to determine the numerical values 
of fe by the method of least squares. 

The numerical calculation required by this 
method follows precisely along the same lines as 
described in paragraph 17. I shall for this reason 
not reproduce these calculations but limit myself 
to quote the final results for the various co- 
efficients fc, which are as follows : — ^ 

^ Interested readers may consult the detailed com- 
putations on pages 246—267 in my Mathematical 
Theory of Probabilities (2nd Edition, New York, 
1921. 
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A:o = 7361.8; A, = —212.2; A:^^— 9.6. 

The final equation of the frequency curve of 
the first component F (x) , is therefore : — 

Fi{x) = 7361.8cpo(2:) — 212.2<p3(z) — 9.6(p4(z), 

where the generating function, <Po(^), is of the 
form : — 

] r iog (130 — a?) — 4.0598 1* 
1 ~2L 0.2106 J 



q>o(2) = Oo(^) = 



0.21061/2:1 



The second component, Fjj (x) , can by means 
of a similar process be expressed by the equa^ 
tion : — 

Fn{x) = 947.4<po(z)— 63.4(p8(2)— 30.0cp4(2), 
where 

1 r iog jx -h 68.8) - 4.582 -|« 
1 ^ 2L 0.12 J 



<Po(2) = ^o(a:) = 



0.12)/27I 



Addition of these two component curves gives 
us the ultimate compound frequently curve, 
representing the d^ of the mortality table. 

A comparison between the observed values of 
dx and the values of dx as computed from the 
above equation is shown in graphical form in the 
attached diagram. Evidently the graduation leaves 
but little to be desired in the way of closeness 
of fit. 
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Figure 1. 

Diagram showing graduation of dy. column in the AM (5) table by a 
compound frequency cnrye of the Gram-Gharlier types. 



26. BIOLOGICAL It appears that the Italian 

INTERPRETATION x i.- x* • xu n -x x 

OF MORTALITY Statisticians were tne iirst to 

break up the dx curve into a 
system of five or more component frequency 
curves, which, however, were all of the noniia>l 
Laplacean type; Pearson who in a brillant essay 
entitled Chances of Death was the next to attack 
the problem, employed a system of five skew 
frequency curves. Already as early as 1914 I found 
that from ages above 10 the majority of dx 
curves in previously constructed mortality tables 
could be represented by not more than two skew 
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irequency curVes as shown in the above exanaple 
oi the AM (6) table. 

Although all such investigations may be very 
interesting ajid useful from the point of view of 
the actuary, we must, however, not overlook the 
fact that the breaking up of the compound dx 
curve in the manner just described is merely an 
empirical process pure and simple. While such 
processes undoubtedly represent very neat methods 
of graduation, a quite different and more im- 
portant question is whether mathematical work 
of this kind allows of a biological interpretation. 
It is evident that from a mere mathematical point 
of view we may break up the dj^ curve into various 
component parts in an infinit-e number of ways. 
But while such breaking up processes may be 
extremely interesting as actuarial graduations and 
exercises in pure mathematics, they have evidently 
little connection with the underlying biological 
facts of a mortality table. This aspect of the 
question has been brought out in a very forcible 
manner by the eminent American biometrician, 
Eaymond Pearl, in his 1920 Lowell Institute 
licctures. The whole subject would appear in a 
quite different light if it were possible to give a 
biological interjpretation of the mathematical 
analysis and to show that the component fre- 
quency curves as derived from pure matheniatics 
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have a counterpart in actual life. This, I think, 
would be very difficult, if not impossible to 
establish, because it is not mathematics which 
determines the conduct or behavior of living 
organisms. One might, however, view the whole 
problem from the standpoint of the biologist 
rather than from the standpoint of the mathema- 
tican. The problem then is to ascertain whether 
the observed biological facts as shown in the 
collected statistical data allow of a mathematical 
interpretation, rather than to find a biological 
interpretation and counterpart of previously 
established empirical formulae. 

It is to this important question that I have 
devoted the entire discussion of the second chapter 
of this book. I have proceeded from certain 
observed biological facts (in this particular 
instance the statistics on the number of deaths 
by sex and attained ages from more than 150 
causes of death) which represent the natural 
phenomena under investigation. In order to offer 
a rational explanation of these facts and to inter- 
prete their quantitative relationships, I have 
adopted as a working hypothesis the supposition 
that the number of deaths according to attained age 
and sex among the survivors of a homogeneous 
cohort of say 1,000,000 entrants at age 10 tend 
to cluster around specific ages in such a manner 
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that their frequency distribution by attained agee 
can be represented by a limited number of sets 
of Gram-Charlier or Poiseon-Charlier frequency 
curves. 

On the basis of this hypothesis we can now 
by simple mathematical deductions oonstruct a 
mortality table from deaths by sex, age and cause 
of death and without any information about the 
lives exposed to risk at various ages. 

Finally we can verify the ultimate' results 
contained in this final mortality table by working 
back from the table to the data originally 
observed. 

This procedure is in strict conformity with 
the model of modern science, which according 
to Jevons consists of the four processes of obser- 
vation, hypothesis, deduction and verification. 

The important factor in this investigation, 
and one which most actuaries and statisticians 
fail to grasp, is that I have looked at the whole 
problem as a biometrician rather than as a 
mathematician. Mathenaatics has been employed 
only as a working tool in the whole process, and 
the reason that the method has met with success 
must be sought for in concrete biological facts 
and not in the realm of mathematics. 
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26. poissoN's ill certain statistical series it 

^FwfcTWN^ frequently happens that the 

semi-invariants of higher order 
than zero all are equal, or that 

Aj = A2 ^7^ A3 = .... == Af = A. 

We shall for the present limit our discussion 
to homograde statistical series where the variates 
always are positive and integral, and where there- 
fore the definition of the semi-invariants is of the 
form : — 

Xco Xco^ XCD* 



= q>(0)e^^ + (p(l)c^^ + (p(2)e2« + (p(3)e» «» + ...., 
or 

e 
for a; = 0, 1, 2, 3, . . ., 

which also can be written as 

.-.(x,-,-r,....) = 

= .9(0)1 + (p(l)6«^ + Cp(2)e2a, ^ _ _ 

The coefficient of e^^ gives the relative fre- 
quency or the probabitity for the occurence of 
X = r, and we find therefore that 
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cp(a;) = \t)(r) = -j^. 

This is the famous Poisson Exponential, so 
called after the French mathematician, Poisson^ 
who first derived this expression in his Recherches 
BUT la Probabilites des jiigesments, but in an 
entirely different manner than the one we have 
indicated above. 

The Poisson Exponential opens a new way 
for the treatment of statistical series which poss- 
ess the attribute that all their semi-invariants of 
higher order than zero are all equal, or nearly 
equal. It is readily seen that whereas the Lap- 
lacea probability function <po(aj) contains two 
parameters X^ and o the probability function of 
Poisson contains only one parameter, X. 



FiS^^NCY previous chapters that the 
<^^^^^' Gram-Charlier frequency curve 



27. POISSON— We have already seen m the 

so 

could be written as 

F{x) = ^Ciipi{x) = l.CiHi{x)<po{x) 
for i=0, 1, 2, 3, ... . 

where ^q(x) is the generating Laplacean proba- 
bility function. 

The idea now immediately suggests itself to 
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use a similar method of expansion in the case of 
the Poisson probability function and to employ 
this exponential as a generating fuction in the 
same manner as the Laplacean function. We are, 
however, in the present case of the Poisson 
exponential dealing with a generating function 
which so far has been defined for positive integral 
values only and, therefore, represents a discrete 
function. For this reason it will be impossible to 
express the series as the sum-products of the suc- 
cessive derivatives of the generating function and 
their correlated parameters c. We can, however, 
in the case of integral variates express the series 
by means of finite differences and write F(x) as 
follows : 

F{x) = Co^{x) + Ci Ai|)(:r) + c^ A«i|)(a:) .... (/) 

where y^{x) = er^m^ix! for x = 0, 1,2, 3, . . . . , 
and 

ZT|)(a:) = 1, 

Ai|>(a:) =t})(a;)-it>(:r-l), 
A«T|)(rr) = Ai|)(a;) — AT|)(a;— l) = \t)(a;)— 2i|)(a;— 1) 

+ ^^(x—2). 

The series (J) is known as the Poisson-Char- 
lier frequency series or Charlier's B type of 
frequency curves. 

The semi-invariants of these frequency series 
are given by the following relation : 
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e = 

z = 

= 2^[coil)(x)+CiAil)(a;)+C2A*it)(x)+ . . .] e*" 

Expanding and equating the co-efficients 
of equal powers of co we have : 

Ao = 1 = CoStj) (a;) or Co = 1 

Ai = Y.X (ij) (x) + CiAi|)(ic) + C2 A2tj)(a;) + . . .) (//) 

\^ + Xg = Za;2 (t}) (a;) + CiAij)(a:) + Cg A2it)(a:) + . . .) 



We now have 

Stj)(a;) = 1, and 
Za:i|) (a:) = Xme-^ m^-^ : (a; — 1) ! = mZij) (a; — 1) = m. 

We also find from well-known formulas of the 
calculus of finite differences that^ 

^xMf{x) = — 1 



* These formulas can also be derived from the de- 
finition of the semi-invariants and the well-known rela- 
tions between moments and semi-invariants as given on 
page 74 when we remember that according to our de- 
finition all semi-invariants in the Poisson exponential are 
equal to m. 

7 
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i:xAh\>(x) = Q 
Sa:«At})(aj) = — (2m -h 1) 

Substituting these values in (II) we obtain 

Xi = m — c^ 

Xi^ + A2 = w.^ + ^ — (2m +1) Ci + 2^2 

By letting m = X^ we can make the coefficient 
Ci vanish, which results in 

Xj = m 

C2 = %[Xa — m] 

where the two semi-invariants X^ and X^ are cal- 
culated around the natural zero of the number 
scale as origin. 

For the above discussion we have limited 
ourselves to the determination of the three con- 
stants m, Cq and Cg. It is easy, however, to find 
the higher parameters ^3,^4,05,... from the 
relations between the moments of the Poisson 
function and the semi-invariants of order 3, 4, 
5, . . . ect. Charlier usually calls the parameter m 
the modulus and Cg the eccentricity of the B 
curve. 
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28. NUMERICAL As an iUustration of the appli- 
EXAMPLEs ^^^^^ ^f ^jjg Poisson-Charlier 

series we select the following 
series of observations on alpha particles radiated 
from a bar of Polonium as determined by Euther- 
ford and Geiger. 

The appended table states the number of 
times, F(x)y the number of particles given oflf in 
a long series of intervals, each lasting one-eighth 
of a minute had a given value x : — 

X Fix) X Fix) X Fix) 






57 


5 


408 


10 


10 


1 


203 


6 


273 


11 


4 


2 


383 


7 


139 


12 





3 


525 


8 


45 


13 


1 


4 


532 


9 


27 


14 


1 



We are here dealing with integral variates 
which can assume positive values only and the 
observations are therefore eminently adaptable to 
the treatment by Poisson-Charlier curves. Select- 
ing the natural zero as the origin of the co- 
ordinate system we find that the first two semi- 
invariants are of the form 

\i = 3.8754, X2 = 3.6257, and we therefore have : 
m = Xi - 3.86; c^ = ^[X. — m] = —0.125. 

The equation for the frequency distribution of 
the total N = 2608 elements therefore becomes 

7* 
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Fix) =N[^,.,,(x) + (— 0.125)2Ait)3.33(x)]. 

The table below gives the values as fitted to 
the curve, Fix) : 



Alpha Particles Discharged from Film 

(Rutherford and Geiger). 

N = 2608, m = 3.88, c^ = — 0.12B 

(1) (2) (3) (4) (5) 

X y\)(x) A*i|?(:c) NX (2) NX(S)Xosi 

.020668 +.020668 53.9 — 6.7 

1 .080156 +.038820 209.0 —12.7 

2 .155455 +.015811 405.4 — 5.^ 

3 .201015 —.029793 524.2 + 9.7 

4 .194967 —.051608 508.5 +16.8 

5 .151625 —.037654 394.5 +12.3 

6 .097850 —.009714 254.9 + 3.2 

7 .054249 +.009814 141.2 —3.2 

8 .026316 +.015668 68.7 — 5.1 

9 .011351 +.012968 29.6 — 4.2 

10 .004407 +.008021 11.5 —2.6 

11 .001555 +.004092 4.1 —1.2 

12 .000503 +.001800 1.3 — 0.6 

13 .000150 +.000699 0.4 — 0.2 

14 .000042 +.000245 0.1 — 0.1 

15 .000010 +.000076 0.0 —0.0 

16 .000003 +.000025 

17 .000001 +.000005 



of Polonium 



(6) 
(4) + (5) 

47 
196 
400 
533 
525 
407 
258 
138 
64 
25 

9 

3 

1 













As a second example we offer our old friend, 
the distribution of flower petals in Ranunculus 
Bulhosus. Selecting the zero point at rr = 5 and 
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computing the semi-invariants in the usual 
manner we obtain the following equation for the 
frequency curve. 

Fix) = 222it)(x) + 31.5A2T|)(a!), m = 0.631. 

A comparison between calculated and observed 
values follows : — 



X 

5 
6 

7 

8 

9 

10 



F(x) 

134.9 

51.6 

22.5 

9.5 

2.9 

0.6 



Obs. 

133 
55 
23 

7- 
2 
2 



29. TRANS- 
FORMATION OF 
THE VARIATE 



For integral variates we have 
shown that the Poisson fre- 
quency curve possesses the im- 
portant property that all its semi-invariants are 
equal. Now while a frequency distribution of a 
certain integral variate, Xj may perhaps not 
possess this property, it may, however, very well 
happen after a suitable linear transformation has 
been made, that the variate thus transformed will 
be subject to the laws of Poisson 's function. 

Let z = ax — b represent the linear trans- 
formation which is subject to the above laws with 
a series of semi-invariants all equal to m. 



» • ■ 



: :• • • 



■ ■ ■ 



102 ' Preqtrcihcy Curves. 

These semi-invariants according to the pro- 
perties set forth in paragraph 5 are therefore 

m = Xi(z) = a\i{x) — b 
m = X2(z) = a^Xii^) 



and our problem is to find the unknoven para- 
meters a, b and m. 

Simple algebraic methods, which it will not 
be necessary to dwell upon, give the following 
results : 

a = A2 1 A3 

m = Xg'iXs^ 

b = a\i — m 

As a numerical illustration of this trans- 
formation we choose from J0rgensen a series of 
observations by Davenport on the frequency 
distribution of glands in the right foreleg of 2000 
female swine. 

No. of Glands.. 01 23 456789 10 
Frequency 15 209 365 482 414 277 134 72 22 8 2 

The values of the three first semi-invariants are 
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\ = 3.501, X2 = 2.825, X3 = 2.417, 

a = 2.826 : 2.417 = 1.168, 

m = 2.825^ : 2.417^ = 3.859, 

b = (1.168) (3.501) — 3.859 = 0.230. 

The new variable then becomes z = az — b 
and the transformed Poisson probablity function 
takes on the form : 

Hz) = -^. 

In general, however, we will find that z is not 
a whole number and the expression z ! therefore 
has no meaning from the point of view of 
factorials at least. This difficulty may, however, 
be overcome through the introduction of the well- 
known Gamma Function, V{z + 1), which holds 
true for any positive or negative real value of z 
and which in the case of integral values of z 
reduces to Ff^ + 1) = ^ ! 

Hence we can write the transformed Poisson 
probability function as 

Hz) = Yip-X) 

Tables to 7 decimal places of the Gamma 
Function, or rather for the expression — F (2; + 1) , 
have been computed by J0rgensen in his Frekvens- 
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flader and Korrelation from z = — 5 to 2; = 15, 
progressing by intervals of 0.01. 

By means of this table and the tables of 
ordinary logarithms it is now easy to find the 
values of t|) (;?) in the case of the example relating 
to the number of glands in female swine. The 
detailed computation is shown below. ^ 



(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


• (7) 


X 


z 


log 

r(.+i) 


logm2 


(3) + (4) 
+ loge w» 


Hz) 


F(x) 





.230 


.9209 


.8651 


.1101—2 


.0129 


30.1 


1 


+ .938 


.0108 


.5500 


.8849—2 


.0767 


179.2 


2 


2.106 


.6555 


.2350 


.2146—1 


.1639 


382.9 


3 


3.274 


.0679 


.9199 


.3119 1 


.2051 


479.1 


4 


4.442 


.3216 


.6048 


.2501—1 


.1780 


415.8 


5 


5.610 


.4547 


.2897 


.0685-1 


.1171 


273.6 


6 


6.778 


.4904 


.9746 


.7891—2 


.0615 


143.7 


7 


7.946 


.4446 


.6595 


.4282 2 


.0268 


62.6 


8 


9.114 


.3285 


.3444 


.9970—3 


.0099 


23.1 


9 


10.282 


.1506 


.0294 


.5041 3 


.0032 


7.5 


10 


11.450 


.9177 


.7143 


.9561 4 


.0009 


2.1 



^ The characteristics of the logarithms have been 
omitted in this table (except in column 6) and only the 
positive mantissas are shown. Column 7 represents the 
2000 individual observations pro rated according to 
column 6. 



CHAPTER II 

(TRANSLATED BY MR. VIGFUSSON) 



THE HUMAN DEATH CURVE 

In the following paragraphs I 

1. INTRODUCTORY . , , ^ _ ^ ^ ^, n i. 

REMARKS intend to discuss a method of 

constructing mortality tables 
from mortuary records by sex, age and cause. of 
death, but without reference to or knowledge of 
the exposed to risk at various ages. This proposed 
method is indeed one which has been severely 
criticized in certain quarters, and several critics 
flatly deny that it is possible to construct morta- 
lity tables from such data without detailed infor- 
mation of the exposed to risk. It is, however, a 
very dangerous practice to say that a certain thing 
is impossible. The true scientist, least of all, 
should attempt to set limits for the extension of 
human knowledge. It is still remembered how the 
great August Comte once denied that it ever 
would be possible to determine the chemical con- 
stituents of the celestial bodies. Only a few years 
after this emphatic denial by the brilliant French- 
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man the spectroscope was discovered, by means of 
which we have been able to detect a number of 
chemical elements of other worlds than that of 
our own little earth. It is but fair to say that the 
method which we here shall describe has met with 
rather determined opposition in certain actuarial 
quarters. Under such circumstances it is natural 
that the process will be viewed in a light of scep- 
ticism and criticism. I welcome such an attitude 
because it has been my purpose to present the 
following studies for further investigation and not 
to force them upon my readers as authoritative 
or as a kind of infallible dogma. 

In presenting the outlines of the proposed 
method I wish to state that it has never been the 
intention to supplant the orthodox methods of 
constructing mortality tables where we have ex- 
act information of the so-called "exposed to risk" 
or number living at various ages. Numerous and 
very important examples, however, offer them- 
selves in actuarial and statistical practice where 
such information is not available. Most of the 
greater American Life Insurance Compames, 
especially those writing the so-called industrial 
insurance, have on hand an enormous amount of 
information of deaths by sex, attained age and by 
cause of death among their policyholders. Even 
the mortuary records of certain occupations, as 
for instance metaJ and coal miners, among the 
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death claims in the industial class are so numer- 
ous, that it would be possible to construct a mor- 
tality table for such professions if we know the 
exact number exposed to risk at various ages. 
Such information is, however, in the majority of 
cases wanting, or could only be obtained by means 
of a great expenditure of time and labor. Again, 
as Mr. F. S. Crum has pointed out in an article 
in the "Insurance and Commercial Magazine", a 
number of cities and states in United States give 
from year to year very detailed information in 
regard to mortuary records by sex, age at death 
and cause of death. On account of the intense 
migration taking place in certain sections of the 
United States, especially in those of an industrial 
character, it is, however, impossible to know the 
exact population at various ages, except in the 
particular years in which the federal or state 
census has been taken. The fact that for all but 
a few states of this country the intercensal period 
is no less than ten years, the determination of the 
population composition by age and sex for a given 
locality and intercensal year, with any degree of 
accuracy, becomes a practical impossibility without 
a special count. Such a count or census of a / 
specific locality or a single city is, however, a 
costly undertaking at its best, for which the nec- 
essary funds are rarely available. In all such 

instances the mortuary records are practically 
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worthless in so far ae the construction and com- 
putation of death rate© are concerned, if we are 
to rely solely upon the usual method of construct- 
ing mortality tables. It will therefore readily be 
seen that, apart from purely academic interests, 
the possibility of establishing a method of con- 
structing mortality tablee without knowing the 
population exposed to risk at various ages would 
be of great practical value, and I deem no apology 
necessary to present the following method, which 
intends to overcome this very obstacle of having 
no information of the exposures. 

2. EMPIRICAL AND In Order to bring the method 

INDUCTIVE ME- • j. ii. x- -a 

THODS OF soLU- mto the proper perspective it 

will be of value to contrast it 
with the ordinary methods followed in the con- 
struction of mortality tables. Let us therefore 
briefly review those methods and principles com- 
monly employed by actuaries and statisticians. A 
certain number, say L^ persons at age x, are kept 
under observation for a full calendar year and the 
number, D^, who die among the original entrants 
during the same year are recorded. The ratio 
D^ : L^ is then considered as the crude probabi- 
lity of dying at age x. Similar crude rates are ob- 
tained for all other ages and are then subjected to 
a more or less empirical process of graduation to 
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smooth out -the irregularities arising from what is 
considered as random sampling. One then chooses 
an arbitrary radix, say for instance 100,000 per- 
sons at age 10, which represents a hypothetical 
cohort of 10-year old children entering under our 
observation. This radix is then multiplied by the 
previously constructed value of g^^and the product 
represents the number dying at age 10. This 
number, d^Q, is subtracted from Zj^ or 100,000 and 
the difference is the number living at age 11 or 
Zji. This latter number is then multiplied by g^i 
and the result is d^^, or the number dying at age 
11 out of the original cohort of 100,000. In this 
way one continues for all ages up to 106, or so. 

It is to be noted that the colunan of q^ in this 
procese represents the fundamental column while 
the columns of l^ and d^ are purely auxiliary 
columns. 

AUow us here to ask a simple question. Do 
these empirically derived numbers of deaths at 
various ages out of an original cohort of 100,000 
entrants at age 10 give us any insight or clue as 
to the exa<5t nature of the biological phenomenon 
known as death, and are we by this method enab- 
led to lift the veil and trace the numerous causes 
which must have been at work and served to pro- 
duce the total effect, the d^, curve, of which we 
by means of the usual methods have a purely 
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empirical representetion? I fear that this question 
will have to be answered in the negative. The 
usual actuarial methods do not give us a single 
glance into the relation between cause and effect, 
which after all is the ultimate object of investiga- 
tion for all real science. Probably some critics 
would answer that they are not interested in in- 
vestigating causal relations. Such an attitude of 
indifference is, however, very dangerous for a sta- 
tistician or an actuary whose very work rests upon 
the validity of the law of causality. We may, 
however, overlook this apparent inconsistency of 
the empiricists and turn our attention to the pro- 
posed methods of constructing mortality tables 
along inductive lines, or by the process which 
Jevons has termed a complete induction. 

Such a process we should find diametrically 
opposite to the methods of the empiricists, both in 
respect to points of attack and deduction. In the 
case of the empiricists the q^ is the initial and 
fundamental function from which the d^ column 
is computed as a mere by-product. The rationalistic 
method starts with the d^ column and terminates 
with the q^ as the by-product. 

Being primarily interested in the absolute 
number of deaths and not in the relative frequen- 
cies of deaths at various ages, our first question 
is therefore, "What is the form of the frequency 
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curve representing the deaths at various ages 
among the survivors of the original group of 
100,000 entrants at age 10?'* Right here we can, 
strange to say, apply some purely a priori know- 
ledge. We know a priori that the curve must be 
finite in extent, becau<se of the very fact that there 
is a definite limit to human life, and we also know 
that it assumes only positive values. There can be 
no negative numbers of deaths unless we were to 
regard the reported theological miracles of resur- 
rections from the Jewish-Christian religion as 
such. This information about the death curve, or 
the curve of d^, is, however, not sufficient for use 
as a basis for our deductions. We must therefore 
look about for additional information, whether of 
an a priori or an a posteriori nature and of such 
a general character that it can he adopted as a 
hypothesis. 

It was Poincare who once said 

3. GENERAL PRO' ., , ,. .. . 

PERTiESOFTHE that cverv generalization is a 

**DEATH CURVE** , .1 • -rr in 

hypothesis. Hence we shall 
look for some general' characteristics which all 
mortality tables have in common in the age 
interval under consideration (age 10 and up- 
wards). Let us take any mortality table, I do 
not care from what part of the world, and 
examine the general trend of the curve traced 



112 Human Death Curves. 

by the values of d^ for various ages. The curve 
rises gradually from the age of ten. The increase 
in the number of deaths among the survivors at 
various ages will increase, although not uniformly, 
until the ages around 70 or 75 are reached. At this 
age interval we generally encounter a maximum. 
From the ages between 70 and 75 and for higher 
ages the number of deaths among the survivors 
will decrease at a more rapid rate than at the 
earlier stages of life. After the age of 85 only a 
small number of the veteran cohort are still alive. 
After the age of 90 only a few centenarians 
struggle along, keeping up a hopeless fight with 
the grim reaper, Death, until eventually all are 
caiTied off between the ages of 110 and 115. We 
can much better illustrate this process of the 
struggle between the surviving members at va- 
rious ages of the cohort and the opposing forces as 
marshalled by the ultimate victor. Death, through 
a graphical representation. The chart on page 114 
shows a mortality graph of the male population 
in Denmark (1906-1910) from ages 10 and up- 
wards as constructed by the Eoyal Danish Stati- 
stical Bureau. The ordinates of the curve show 
the number of deaths at various ages among the 
survivors of the original cohort of 100,000 entrants 
at agelO. We notice a gradual increase from the 
younger ages until the age of 77, where a max- 
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imum or high crest is encountered. From that age 
a rapid decline takes place until the curve ap- 
proaches the abscissa with a strongly marked 
asymptotic tendency after the age of 90. At the 
age of 110 all the members of the cohort have lost 
out and death stands as the undisputed victor, a 
victor among a mass of graves. The curve we thus 
have traced may properly be called "The Curve of 
Death". On the same chart I have also shown 
a graphical representation of a comparison between 
the Danish death curve and the corresponding 
death curves of males for England and Wales in 
the period 1909—1911, Norway 1900—1910, 
France 1908 — 1913 and United States period 
1909 — 1911, all based upon an original radix of 
1,000,000 entrants at age 10. 

We will notice quite important variations in 
these curves. The curves for the Scandinavian 
countries show a relatively heavy clustering around 
the maximum point which in the case of Den- 
mark is reached at age 75, in England at age 73, 
and in France at age 72. The Danish curve is also 
more symmetrical and shows a more uniform clu- 
stering tendency around the maximum value than 
the other curves. The asymmetry or skewness is 
most pronounced in the American curve, due to 
the comparatively greater number of deaths at 

8 
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younger ages than in the other tables. Tn the 
curve for Norwegian males I might mention 
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another peculiarity which is absent in most oth^r 
death curves. I have reference here to a secondary 
minor maximum or miniature crest at the age of 
21. This maximum point, which is not very pro- 
nounced arises from the heavy mortality among 
youths in Norway, whose male population always 
has consisted of rovers of the sea. A much larger 
proportion of young men braves the terrors of the 
sea in Norway than in any country in the world. 
These sturdy decendents of the Vikings can be 
found in all parts of the globe. You are sure to 
find a weatherbeaten Norwegian tramp steamer 
even in the most deserted and far away harbours 
of our continents. But the sea takes its toll. The 
result is shown in the little peak in the curve of 
death among these sturdy Norwegian youths.^ 

Despite all these smaller irregularities all the 
curves have, however, certain well defined charac- 
teristics, namely : 

1) An initial increase with age. 

2) A well defined maximum point around the 
age period 70 — 80. 

2) A more rapid decline from that point until 
the ultimate end of the. mortality table. 



* Another factor is the high number of deaths from 
tuberculosis typical of youth. See in this connexion dis- 
cussion in paragraph 12 a under the Japanese Table. 

8* 
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The most interesting of these 

4. RELATION OF , ^ . . . 

FREQUENCY c o m m o n characteristics is 

CURVES . • , 

the encountering of a maxi- 
mum point in the neighborhood of 70, and the 
subsequent decline toward the higher ages. This 
fact has a very important biometric significance, 
which we shall discuss in a somewhat detailed 
manner. Most of my readers are familiar with the 
so-called probability curve, expressed by the 
equation : 

This Laplacean or normal curve is represented in 
graphical form by the beautiful bellshaped curve 
so well known to mathematical readers. Various 
approximations to this curve are continually en- 
countered in numerous instances of observations 
relating to certain biological phenomena where 
certain measurable attributes of various sample 
populations tend to cluster around a certain norm, 
such as the measurementB of heights of recruits, 
fin rays in fish, etc. We also know that where this 
t-endency to cluster around the mean is asymmetri- 
cal or skew, it is in many cases possible to give 
a very close representation by the Laplacean- 
Charlier frequency curves. 

Now let us return to our curves of death. It 
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will be noted that all these curves for ages above 
the crest period 70 to 75 to a very marked degree 
approach the form of the normal probability curve 
and exhibit a marked clustering tendency around 
this particular period. The ages around 70, the 
Bible's "three score and ten", can therefore be 
looked upon as a norm of life around v^hich the 
deaths of the original cohort group themselves 
in more or less correspondence with the binomial 
probability law. This pronounced grouping ten- 
dency is a very significant biological phenomenon, 
which it might be of interest to dwell upon. 

If all the members of our original cohort were 
identical aa to physical constitution and characte- 
ristics, if they all were exposed to identically the 
same outward influences acting upon their mode 
of life, it becomes evident from the law of causa- 
lity, which is the basis and justification of every 
collection of statistical data, that all members 
would die at the same moment. We see, however, 
immediately that such hypothetical conditions are 
not present in human society. The paramount 
feature of our material world is variation. No two 
persons are alike in regard to physical constitu- 
tion. Certain inherited characteristics, which are 
present in the individual in more or less pronoun- 
ced form, make themselves felt. No two persons 
or group of persons can be said to be exposed to 
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the saJDQe outward influences. The clergyman and 
college professor Uving' a sort of trajiquil and 
sheltered life are not exposed to the same dangers 
a« the working man or the raan in business life. 
All these and other factory, almost' infinite in 
number, tend to pifoduce a decided variation in 
the actual duration of life. Of these influencing 
factors those relating to purely inherited or na- 
tural characteristics are without doubt the most 
powerful. If it Were possible to eliminate certain 
forms of deaths due to infectious diseases, tuber- 
culosis and ^&»cc^dent8, causes more or less due to 
outward influences, we should have left a number 
of causes due to a; gradual wearing out of the 
human system, similar in many respects to the 
deterioration of the mechanisim in ordinary ma- 
chinery. The death curve froih such causes of. death 
would be more related to the normal curve than 
the death curve which includes causes of death 
from non-inherent or anterior causes as menti- 
oned above. This" statement is borne out in the 
shape of the Danish death curve. In Denmark 
where a veiry determined and largely successful 
fight. has been carried on against tuberculofli-s, and 
where' the accident rate is very low we also find 
that the curve is more symmetrical than for in- 
stance in this country or in England. 

This tendency to an iapproach towards the hi- 
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nomial probability curve v^as already noted by 
Lexis, who from such considerations tried to de- 
termine what he called a "Normalalter" or normal 
age for various countries and sample populations. 
Speaking of this attempt the eminent Danish sta- 
tistician, Harald Westergaard, says in hifi „Sta- 
tistikens Teori i Grundrida" (Copenhagen 1916) 
"An unsually interesting attempt has been made 
by Lexis to determine the normal age of man. 
A mortality t^le will, as a rule, have two 
strongly dominant maximum points for the num- 
ber of deaths. During the first year of life there 
dies a comparatively large number. From the age 
of 1 the number of deaths decreases and reaches 
its lowest point in early youth. It then again 
begins to increase, at times in wavelike motions, 
until the maximum point is reached at the old 
age period''. 

"The clustering around the latter point has 
now a great likeness with the normal or Gaussian 
curve, and we might for this reason call this 
specific age the normal life age. For the cal- 
culation of such a normal age the argument may 
be put forth that experience shows that the great 
variations in mortality tend to disappear in old 
age. Let the rate of mortality in a certain gene- 
ration at age x be [ix and the number of the cor- 
responding survivors be Ix- The quantity [ixlx will 
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then increase from a certain point, while Ix de- 
creases, in the beginning slowly, but later on at a 
more rapid pace. "During a long period of life the 

quantity [ixlx — the number of deaths at a certain 
age — ^will increase with age. Later on a reversed 

motion takes place. But when this reversion will 
occur depends on many conditions, the succeseful 
fight against certain diseases, progress in econo- 
mic conditions, or change in the mode of living. 
All this exercises an important influence, and the 
maximum point occurs therefore sometimes sooner 
and sometimes later. It is also important to in- 
vestigate the natural selection in old age, which 
so to say divides the population in different strata, 
each with its own state of health. The healthiest 
of such groups will with the increase in age play 
a greater role. Here as everywhere it is the more 
important problem to study the clustering around 
the mean inside the special groups rather than to 
attempt to find a derived expression for the morta- 
lity. On the other hand, the correspondence be- 
tween the normal curve as established by Lexis 
is another testimony to the fact that this curve 
or formula very often can be applied, even in 
complicated expressions'*. 
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«. THE *'DEATH Lexis was satisfied to deter- 
^ COMPOUND mine the normal age. A more 
CURVE ambitious attempt to investi- 

gate the mortality by means of frequency curves 
throughout the whole period of life was made by 
the eminent English biometrician, Pearson, in a 
brilliant essay in his "Chances of Death*'. Pear- 
son took the number of deaths in the English 
Life Table No. 4 (males) and succeeded in break- 
ing up the compound curve into five component 
curves typical of old age, middle age, youth, child- 
hood and infancy. I want to advise my readers to 
study this brilliant and illuminating essay, especi- 
ally on account of its beautiful form of exposition 
which makes the whole subject appear in a most 
interesting light. 

Speaking of this attempt by Pearson, the 
American actuary, Henderson, is of the opinion 
that „the method has not, however, been applied 
to other tables and it is difficult to lay a firm 
foundation for it, because no analysis of the deaths 
into natural divisions by causes or otherwise has 
yet been made such that the totals in the various 
groups would conform to these (the Pearson) 
frequency curves'*. We shall later on come back 
to this statement by Henderson, which we feel 
is a partial truth only. On the other hand, it must 
be admitted that the system of Pearson's types of 
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skew frequency curves (by this time tvsrelve in 
number) axe by no means easy to handle in 
practical vs^ork and often require a large amount 
of arithmetical calculation. Moreover, there seems 
to be no rigorous philosophical foundation for the 
Pearsonian types of curves, and they can at their 
best only be said to be exceedingly povs^erful and 
neat instruments of graduation or interpolation. 

On the other hand, I am of the opinion that 
the goal can be reached more easily if we, instead 
of the Pearsonian curve types, make use of the 
Laplacean-Charlier andiPoisson-Charlier frequency 
curves, which are expressed in infinite seriies of 
the form : 

F{x) = cp(a:) + P3(pni(a:) + p,(piV(a:) + ..., (2) 
or i^.(a;) =i|)(a:) + Y2 A2 tj,(a:) + Y3 A3il)(x) + . . .. (3) 

These two curve types have been treated 
elsewhere by Gram, Charlier, Thiele, Edge worth 
J0rgensen, Guldberg and other investigators, and 
it is therefore not necessary to dwell further upon 
their analytical properties, which were discussed 
in Chapter I. 

Returning now to the general form of our d^ 
curve of the mortality table which we discussed 
above, it is readily seen that this curve has all the 
properties of a compound frequency curve, that 



Compound Curves. 123 

i^, a curve which is composed of sevei^aJ minor or 
subsidiary frequency curves, generally skew in 
appearance. As proven both by Charlier and by 
J0rgensen, any single valued and positive comp- 
ound frequency curve vanishing at both -|- ^ and 
— 00 can be represented as the sum of Laplacean- 
Charlier and Poisson-Charlier frequency curves. 
We know thus a priori that the d* curve is comp- 
ounded of the two types of frequency curves. But 
how are we to determine the separate component 
curves? It is readily admitted that no a priori 
reason will guide us here. The purely empirical 
observer might therefore abandon the project 
right here, because to all appearances it would 
seem hopeless to attempt a solution by purely 
empirical means. The positive rationalist I does 
not despair so easily. "Very well**, he says, "if 
we can not make further progress by purely 
empirical means, we are at least permitted to try 
deductive reasoning and attempt to bridge the gap 
by means of an hypothesis**. The hypothesis I 
shall adopt is the following : 

The frequency distribution of deaths ac- 
cording to age from certain groups of causes 
of death among the survivors in a mortality 
table tend to cluster around certain ages in 
such a manner that the frequency distribution 
can be represented by either a Laplacean- 
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Charlier or a Poisson-Charlier frequency 
curve. 

A study of mortuary records by age and cause 
of death immediately supports this hypothesis. 
We notice, for instance, that diseases such as 
scarlet fever, measles, whooping cough and diph- 
theria often cause death among children, but 
rarely seem to afPect older people. We know, for 
instance, that there is a much greater probability 
that a 5-year old boy will die from scarlet fever 
than a man at the age of 40 w(ill die from the 
same disease. On the other hand, there is quite 
a large probability that an old man at age 86 
will die from diseases of the prostate gland, while 
such an occurrance is almost unheard of among 
boys. Similarly deaths from cancer and Bright's 
disease are very rare in youth, but quite frequent 
in early old age. Tuberculosis, on the other hand, 
causes its greatest ravages in middle life, and has 
but little efPect upon older ages. 



6. MATHEMATICAL Leaving, however, the ques- 

PROPERTIES OF . J^"' .* 

THECOMPo- tion of the eroupme of causes 

NENT FREQUEN' ° r & 

CY CURVES of death into a limited num- 
ber of typical groups to a later discussion, we shall 
in the meantime see how the hypothesis can carry 
us over the difficulties. Let us for the moment 
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assume that we are able to group the causes of 
death into say 7 or 8 groups. We shall also as- 
sume that we know the percentage frequency 
distribution of deaths according to age in each 
of the groups. This means in other words that 
we know the equation of the frequency curves 
giving the percentage distribution. Let the ana- 
lytical expression for these frequency curves be 
denoted by the symbols: 

Fi{xl Fnix'h Fiu(x), . . ., Fvm{x), (4) 

Again, let the total number of deaths among the 
survivors in the mortality table from causes of 
death according to the above grouping be denoted 

by 

Ni, Nil, ^m, -^iv, . . •, -^viii respectively. (5) 

The number of deaths in a certain age interval, 
say between 50-54 can then be expressed as 
follows : 

g = 54 54 A4 

^d, =^N, Fx {x) +^iV„ Fxx{x)+... 



« = 50 50 50 

54 



+ yy^ 



\\\\F\\ii(x), 

50 



(6) 



In this relation the only knowoi quantities are 
the equations for the frequency curves Fi[x), 
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Fii(x), . . ., -Fviii(^/ of the percentage frequency 
distribution according to age in each of the eight 
groups. Neither d^ nor any of the various N's are 
known. The only relation we know a priori among 
the quantities N is the following : 

Ni + Nu+Nui+ . . . JVviii = 1,000,000. (7) 

The latter equation is simply a mathematical 
expression for the simple fact that the sum total 
of the sub-totals of the various groups of causes 
of death, in other words the deaths from all 
causes among the survivors in the mortality table, 
must equal the radix of the entrants of our orig- 
inal cohort of 1,000,000 lives at age 10. Viewed 
strictly from the standpoint of frequency curves, 
we might express the same fact by saying that 
the sum of the areae of the various component 
curves must equal 1,000,000. 

It is readily seen that on the assumption that 
the expressions of the different F(x) conform to 
the above hypothesis it is possible to find d^ for 
any age or age interval if we can determine the 
values of the different N*s. It is in this possibility 
that the importance of the proposed method lies, 
and we shall now show" how it is possible to deter- 
mine the N*s without knowing the exposed to 
risk. 
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7. OBSERVATION Consider for the moment the 

EQUATIONS . „ . 

foUowmg expression : 



54 

1 

54 7-1 / V 50 



50 III 



^Nm Fin (x) 



54 54 



^Ni Fi {x) +2^11 {x) Nn + 



50 5U 

54 54 



(8) 



+^iVin Fin (x) + ... +^Nvni Fwni (x) 



50 50 



What does this equation represent? Simply the 
proportionate ratio of deaths in group III to the 
total number of deaths in all type groups (in 
other Words the deaths from all causes) in the age 
interval 50-54. Such ratios are usually known as 
proportional death ratios. It is readily seen that 
these proportionate death ratios are dependent on 
the deaths alone and absolutely independent of 
the number exposed to risk, provided tne total 
number of deaths from all causes in a certain age 
group is large enough to eliminate variations due 
to random sampling.^ In other words, we can find 



^ Strictly speaking this statement is only true for an 
age interval of one year or less and may in the case of 
large perturbing influences in the population exposed ta 
risk be subject to appreciable errors when we use large 
age intervals of 10 or more in our grouping for the com- 
puting of R(x). When the age interval for the groupings 
of causes of deaths by attained ages is 6 years or less 
the error committed in assuming R(x) as being indepen- 
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a numerical value for the term Rjji (x) on the left 
side of the equation from our death records alone 
without reference to the exposed to risk in this 
interval. Similar proportionate death ratios can 
of course without difficulty be determined for the 
other groups of causes of death and for arbitrary 
ages or age intervals. In this manner we can 
determine a system of observation equations with 
known numerical values of R^(x)(i = I, II, III, . . .) 
The fact that the number of observation equations 
in this system is much larger than the number of 
the unknown N*8 makes it possible to determine 
these unknowns by the method of least squares. 

Probably the simplest manner is first to deter- 
mine by simple approximation methods, or by 
mere inspection, approximate values for the 
various N's and then make final adjustments by 
the method of least squares. 

Let, for instance, 

'Nu 'Nn. 'Nuu ... 



dent of the number exposed to risk is in most cases 
negligible. One of the difficulties encountered in the 
construction of a mortality table for Massachusetts Males 
was that the age interval used for the grouping was 10 
years instead of 6 years or less. See in this connection 
the remarks at the beginning of paragraph 11 and at 
the conclusion of paragraph 16 of the present chapter. 
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be the first approximations of the areas of the 
various groups of frequency curves so that 



Ni = 



a^'Nu Nil = a/JVn, . . ., 1 

Nyiu = a,'Nyiu. J ^^^ 



Let us furthermore introduce the following 
symbols : 

'NyuiFyui(x) = ^,(x). j ^^"^ 

The different values of 

Oi(4 <t>^(x), a>3(a;), ..., Og(x) 

may then be regarded as a system of component 
frequency curves to which we now must apply the 
different correction factors a^, Oo, Og, . . . , Og in order 
to fit the curves to the observed proportional death 
ratios, R(x), for the various groups of typical 
causes of death. Let us for example assume that 
the observed death ratio of a certain age (or age 
group), X, under a certain group of causes of 
death, say group No. Ill, is Riii(x). We have 
then the following observation equation : 

Biii{x) = agOgCx): [ch^i{x)-^a^<^^(x)+] 
+a,0,(a:)+. . .+a3 03(rt:) + a,Oo(x)] j ^^^^ 
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Siiice the sum of the areas of the different oomp- 
onent curves necessarily must equal 1,000,000 it 
is easy to see that we may write the factor a, 
io the last term of the denominator in the follow- 
ing form : . 

, a. y ^02 (a;) = 1,000,000 



or 



a, = (1,000,000 -[ai^O,(x)+a3^03 (a;) + 
• • • + ^^y, *8 (^)]) : ^'^2 (^) = 

where 



_ 1,000,000 _ Z Oi (x) 

I _ ^^,(x) _ ZO)^ 

1 "■ Z %(xy ■ • •' « "T Z.<l>2(a;) " - 



(12) 



The; expression for Riu'(x) can then be put in the 
following form : 

Bm(x) = ag O3 (x) : [a^ O^ [x) + a3 4>3 (x) + 
+ (Aq— /titti — . . . — Agag) Oo(x)]. 
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: Similar ob<serv<ation equations for the pthar 
groups are derived without difficulty. ^ 

Once having formed the observation eguationB 
it is simply a matter of routine work to compute 
the normal equations from: which the vieulues of 
the unknown N*s can be found. We shall, ho Wr 
ever, not go into detail with the derivaticm of the 
necessiary formulas/ since this is a proceas which 
belongs wholly to the domain of the theory of 
least squares and which has received adequate 
treatment elsewhere . ( See for instance Brunt ' s 
Combination of Observations!) 

8. CLASSiFiCA- We think it more advantage- 
^^%^DEATH^^ ous to illustrate the method by 

a concrete example. As an 
illustration we may take the* cas6 of Michi- 
gan Males in the period 1909 — 1915. The 
mortuary records of Males in Michigan are 
for that period given "in the reports issued 
annually by the Secretary of State on "Eegistrat- 
ion of Births and Deaths, Marriages afid Divorces 
in Michigan". The deaths by sex, age and cause 
of death are given in quinquennial age groups. A 
very serious drawback is the grouping of all ages 
above 80 into a single age group instead of in at 
least 4 or 5 quinquennial age groups. This makes 
it impossible to obtain good observation eqiiAtJionfi' 

9* 
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for ages above 80. When we consider that about 
one fiifth of the original entrante at age 10 in the 
mortality table die after the age of 80, it is readily 
seen that this defect in the Michigan data is of a 
very serious character, wihich makes it out of the 
question to determine correctly the areas of the 
curves for middle old age and extreme old age. 
For ages below 70 these curves do not play so 
important a role, and the method ought therefore 
in these ages yield satisfactory results. We now 
make the assertion that the deaths among the 
survivors in the final life table can be grouped in 
the following typical groups. 

Causes of Death typical of: — 

Group I Extreme Old Age. 

— n Middle Old Age. 

— m Early Old Age. 

— IV Middle Life. 

— V Early Middle Life. 

— VI Pulmonary Tuberculosis, Etc. 

— Vila Early Life Occupational Hazard. 

— Vllb Middle Life Occupational Hazard. 

— Villa Childhood. 

The classification of causes of death according 
to this scheme is given in the following table, mar- 
ked Table A. 
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Table A, Michigan Males 1909—1915 
Classification of causes of death according to the 

chosen system of curves. 



No. in Inter- 
national Classj 
fication. 

81. 


i. GROUP I 
Diseases of the arteries. 


124. 


Diseases of the bladder. 


125 133. 


Other diseases of the genito-urinary 


142. 
154. 
126. 


system. 
Gangrene. 
Old age. 
Diseases of the prostate. 



GROUP II 

10. Influenza. 

47 — 48. Eheumatism. 

64. Apoplexy. 

65. Softening of the brain. 

66. Paralysis. 

79. Heart disease. 

82. Embolism. 

89. Acute bronchitis. 

90. Chronic bronchitis. 

91 . Broncho-pneumonia. 
94. Congestion of the lungs. 

96 — 97. Asthma and emphysema. 

103. Other diseases of the stomach. 
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No. in Inter- j , , 
national Ctassi- 
, fionljfon. • 

105. Diarrhea and enteritis, (over 2 years) 
14. Dysentery. 

GROUP III 

39. Cancer of the mouth. 

40. Cancer of the stomach and liver. 
;. 41, Cancer of the intestines. 

44. Cancer of the skin. 

45. Cancer af other organs. 

46. Tumors. 
50. Diabetes. 

53 — 54. Leukemia and anemia. 

63. Other diseases of the spinal cord. 
68. Other forms of mental diseases. 
80. Angina pectoris. 
109 — 110. Hernia, intestinal obstruction, and 

other diseases of the intestines. 

120. Bright's disease. 

121. Other diseases of the kidneys 
123. Calculi of urinary passages. 

GROUP IV 
56. Alcoholism. 
18. Erysipelas. 
62. Locomotor ataxia. 
73 — 76. Other diseases of the nervous system. 
77. Pericarditis. 
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No. in Inter- 
national Classi- 
fication. 

78. 

83. 

84. 

85—86. 

87. 
88. 
92. 
93. 
95. 
98. 

99—101. 

111. 

113. 

114. 
115—116. 

118. 
143—145. 

147—149. 



Endocarditis. 

Diseases of the veins. 

Diseases of the lymphatics. 

Other diseases of the circulatory sy- 
stem. 

Diseases of the larynx. 

Diseases of the thyroid body. 

Pneumonia. 

Pleurisy. 

Gangrene of the lungs. 

Other diseases of the respiratory sy- 
stem. 

Diseases of the mouth, pharynx, and 
oesophagus. 

Acute yellow atrophy of the liver. 

Cirrhosis of the liver. 

Biliary calculi. 

Diseases of the liver and spleen. 

Other diseases of the digestive system. 

Furuncle, abscess, and other diseases 
of the skin. 

Diseases of the joints, and locomotor 
system. 



GROUP V 
4. Malarial fever. 
13. Cholera nostras. 
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No. In Inter- 
national Glassl- 
flcatlon. 




20. 


Septicemia. 


24. 


Tetanus. 


32. 


Pott's disease. 


33. 


White swellings. 


34. 


Tuberculosis of other organs. 


35. 


Disseminated tuberculosis. 


55. 


Other general diseases. 


60. 


Encephalitis. 


70—71. 


Convulsions. 


102. 


Ulcer of the stomach. 


117. 


Peritonitis. 


119. 


Acute Nephritis. 


164. 


Diseases of the bones. 


165. 


Suicide by poison. 


156. 


Suicide by asphyxia. 


157. 


Suicide by hanging. 


158. 


Suicide by drowning. 


159. 


Suicide by firearms. 


160. 


Suicide by cutting instruments. 


161. 


Suicide by jumping from bight places 


163. 


Suicide by other or unspecified means 


164—165. 


Accidental poisonings. 


166. 


Conflagration. 


167. 


Burns (conflagration excepted). 


168. 


Inhalation of noxious gases. 


172. 


Traumatism by fall. 
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No. in Inter- 
national Classi- 
fication. 




175 (2). 


Traumatism by electric railway. 


175 (3). 


Traumatism by automobiles. 


175 (4). 


Traumatism by other vehicles. 


176. 


Traumatism by animals. 


178. 


Cold and freezing. 


179. 


Effects of heat. ^ 


185. 


Fractures and dislocations (cause not 




specified. 



GROUP VI 

28. Tuberculosis of the lungs. 

29. Miliary tuberculosis. 
37 — 38. Venereal diseases. 

186. Other accidental traumatism. 

57 — 59. Chronic poisoning. 

67. General paralysis of the insane. 

31. Abdominal tuberculosis. 

GROUP VII 

1. Typhoid fever. 

69. Epilepsy. 

108. Appendicitis. 

182. Homicide. 

169. Accidental drowning. 

170. Traumatism by firearms. 

171. Traumatism by cutting instruments- 
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No. in Inter- 
natlODAl Classi- 
fication. 

173. Traumatism by mines and quarries. 

174. Traumatism by machinery. 
175 — (1). Traumatism by railroads. 

180. Ligthning. 
61. Meningitis. 

GROUP VIII 

5. Smallpox. 

6. Measles. 

7. Scarlet fever. 

8. Whooping cough. 

9. Diphtheria and croup. . 
30. Tubercular meningitis. 

150. Congenital malformations. 

9. OUTLINE OF COM- "^^^ number of deaths in the 
PUTiNG SCHEME yarious groups according to the 

above classification and ar- 
ranged according to age during the period 1909 — 
1915 is given in the table B on page 140. 

From that table it is a simple matter to com- 
pute the proportionate death ratios of the separate 
groups of causes of death. Such a computation is 
shov^n in table C on page 141. 

It is readily seen that these death ratios are 
independent of the number exposed to risk. More- 
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over, the number of observations seem to be suffi- 
ciently large to eliminate serious variations due 
to random sampling. This might perhaps not hold 
true for the age intervals 10 to 14 and 15 to 19 
where not alone random sampling is present, but 
a somewhat modified classification seems neces- 
sary. I have, however, not used the observed pro- 
portionate death ratios for the two younger age 
intervals in my computations which only took into * 
account the ratios above 20. For this reason I do 
not deem it necessary to go into a closer investiga- 
tion of a re-classification of causes of death for 
these younger age groups. A more serious defect 
which cannot be overcome is presented in the 
ages above 80 where, as mentioned before, a clas- 
sification according to age is absent in the original 
records for the state of Michigan. The fact that 
the highest number of deaths (12,473) occurred 
in ages above 80 makes this defect more serious 
than the omission of a re-classification of causes 
of death below 20. 

So far we have only been concerned with the 
first step in the complete induction according to 
the model of Jevons, namely that of simple observ- 
ation. The next step in the induction is the hypoth- 
esis. We present now the following working 
hypothesis. 

The frequency distribution of deaths according 
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to age of the above groups of causes of death 
among the survivors of an original cohort of 
lyOOOyOOO entrants at age 10 can he represented by 
a system of frequency curves determined by the 
following characteristic parameters: 







Parameters 






Group 


Mean 


Dispersion 


Skewness 


Excess 


I 


79.6 years 


9.5730 years 


+ .1066 


+ .0646 


II 


70.6 - 


12.8000 


+ .0967 


+ .0126 


III 


65.5 - 


13.6870 - 


+ .1248 


+ .0650 


IV 


59.5 


17.0890 - 


+ .1790 


.0106 


V 


55.6 - 


19.9411 - 


+ .0556 


.0367 


VI 


44.5 


16.035? 


- .0124 


- .0272 


Vllb 


67.6 - 


12.1552 - 


+ .0008 


- .0005 


Vila 


Poisson-Charlier Curve: Modulus = 28.5 years, 




Eccentricity = 


1.0001 






Villa 


Poisson-Charlier Curve: Modulus 


— 13.5 years. 



From these parameters and from well-known 
tables of the probability or normal frequency curve 
and its various derivatives it is easy to determine 
the frequency distribution for any desired interval. 

For this system of frequency curves we now 
shall try to find the various areas of iVj, iV^j, 

iVjjj, , Nyjjj so as to conform to 

the observed values of Rx in Table C. As a first 
approach to the final values of N, we may by an 
inspection (which of course is improved upon by 
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a long practice in curve fitting) choose the follow- 
ing approximations.^ 

Group Approximate Value of 'N. 



I 


123000 


II 


366000 


III 


183000 


IV 


105000 


V 


76000 


VI 


70000 


VHa & Vllb 


61000 


VIII 


17000 



1000000 

These preliminary numerical values represent 
the first approximations of the areas of the various 
frequency curves. The sequence represented by 

'N^Jix), 'N^FJ^xyN^FJix),^ . .'N^F^,{x){U) 

gives the number of deaths at age x. We notice 
thus that by multiplying the various equations of 
frequency curves for arbitrary age intervals with 



* These numbers represent as a matter of fact a first 
rough approximation of the areas of the different com- 
ponent curves by means of the method of point contours. 
Hence it is to be expected that the final adjustments 
will be comparatively small. This fact has, however, no 
influence upon the application of the method. 
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their respective 'N*8 we can get a first approxima- 
tion of the final death curve. I give on page 144 an 
approximate table arranged in 5 year intervals. 
We might now first compute the various factors 

A^, Ag Ag which will be common for all 

observation equations. We have, referring to the 
above formulas (llandl2) for the various k*s (15). 

_ 1000000 , ^ 123089 , ^ ^ 183045 ^ 
^ ~ 365995 ' ^ "~ 365995 ' ^ 365995' 

104888 , 75030 , 69996 

* 365995' " 366995' * 365995' 



_ 61003 , _ 17002 

*T — - _ > "^8 — 



(15) 



365995 ° 365995 

Or 

&o = 2,732, &j=0,336, ftg =-0,500, A;^=0,287, 

A:5 = 0,205, A:g= 0,191, ^7= 0,167, ^3= 0,046. 

To illustrate the further process of the compu- 
tation of the observation equations, let us take a 
certain age interval, say the interval between 
50-54. The value of O^ taken from the above table 
is 163.89. The value of R^^ (x) for this interval is 
0.284 (see table page 141). Hence we have the 
following observation equation (16). 

10 
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0.234 = 104.6803 : (16.76aj + 104.53a, + 
84.16a^ + 64.62aj + 73.55a, + 35.01a, + 

O.OOOg + (2.732 — 0.336aj — 0.500a, — i(i6) 
0.287 a^ — 0.205a, — 0.191a,— 0,167 a,— 

— 0.046ag) 163.39]- 

After a few simple redactions this may be 
brought to the following form : 



9.16aj + 99.19a, — 8.72a^ — 7.26a, — 
9.91a,— 1-81 a, + 1.760,-104.46 = 0. 



(17) 



In the routine work I usually use a system of 
computing the various equations which is out- 
lined in detail in the accompanying tabular scheme 
referring to all the groups in the age interval 
50-54 and shown on pages 148-154. 

Similar observation equations are arrived at in 
exactly the same manner for other groups and 
other age intervals. For the whole interval from 
age 20 and upwards we get in this way 96 obser- 
vation equations from which to determine the cor- 
rection factors. The coefficients of these obser- 
vational equations are then written down, and 
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their various products formed in tiurn. We deem 
it not necessary to give all these observational 
equations and their coefficients for all the 96 
observations, but shall limit ourselves to give all 
the necessary computations for the interval from 
50-54 as previously considered. With the usual 
system of notation employed in the method of 
least squares we get the scheme on pages 148-154. 

Normal Equations, Michigan Males 1909 — 1915. 

723763 400750 218930 160776 136184 116318 30326 1801162 

877847 263187 176242 149868 129697 34600 2063941 

237169 90440 72317 62110 16246 964843 

106346 47022 39939 10676 628608 

76774 28909 8668 626296 

63378 7012 437390 

2391 111626 

The addition of the various columns of the sum 
products of the coefficients gives us finally the 
above set of normal equations of which we only 
submit the coefficients in the usual scheme em- 
ployed in the method of least squares. 

Solving the above system of normal equations 
by means of the well-known method devised by 
Gauss, we obtain finally the values on page 154 for 
the various a's by which the approximate values 
'N must be multiplied in order to yield the prob- 
able values of N. 

10* 
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gg gh gs 



60-64 



60-64 





0.0 


0.8 


0.6 


3.2 


188.1 


39.6 


1.4 


88.3 


6.1 


0.8 


47.8 


0.3 


0.8 


46.2 


10.3 


0.3 


16.7 


1.6 


29.2 


1740.4 


69.7 






Sum: 2391.0 


- 111626.0 


-1807.0 




hh 


hs 




«M— ^^W ■ ^m^ ^^w 






72.3 


46.9 




10920.3 


2299.0 




6416.9 


376.4 




2819.6 


16.9 




2631.7 


684.8 




979.7 


93.9 




103877.3 


4167.7 















Sum: 6630212.0 107368.0 

Correction Factors, a. 

Group I 1.03284 

— II 1.00017 

— ni 1.03635 

— IV 1.03731 

— V 1.00956 

— VI 0.97334 

— VHa 0.90332 

— Vnb 0.60565 

— Vni 1.13743 
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Applying the above correction factors to the 
respective values of 'N, we get finally as the total 
areas of the respective component curves : 

Group 



I 


127,131 


n 


366,059 


HI 


189,699 


IV 


108,750 


V 


75,747 


VI 


68,130 


VHa 


33,032 


Vllb 


12,133 


rui 


19,339 



1,000,000 

Multiplying the equations of the various frequency 
curves, F(x), of the percentage distribution in 
each group with the above values of JV we ob- 
tain finally the complete mortality table as will 
be given in the Appendix. The final graphical 
representation of the frequency curves is shown 
in Figure 2. 

10. GOODNESS OF ^^^^ Completes the third step 
^^^ in the inductive process. The 

fourth and final step is the 
verification of the results thus arrived at by a mere 
deductive process. Here it must be remembered 
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that the condition which the final component fre- 
quency curves shall fulfill is the one that observed 
proportionate death ratios shall agree as closely 
as possible with the expected or theoretical pro- 
portionate death ratios as computed from the final 
table. In this connection it must be borne in 
mind that the observed proportionate death ratios 
are given in quinquennial age groups. Thus the 
observed proportionate death rLs in a certain 
age interval, as for example between 50 — 54 are 
really the average or "central" proportionate deatli 
ratios at age 52. From the complete table it is, 
however, possible to compute the proportionate 
death ratios for each specific age. Graphically the 
expected proportionate death ratios will therefore 
represent a continuous curve, while the observed 
ratios will be represented by a rectangular shaped 
column diagram. Such a graphical representation 
is shown in Fig. 3 which simply represents the 
figures in Table C and Table E in graphical form. 
The "goodness of fit" of the "expected" or theore- 
tical values to the "actual" or observed values is 
seen to be very close, especially in the largest and 
most important groups. It is only in the combined 
groups Vila and Vllb that the "fit" might prob- 
ably be open to criticism for higher ages, but even 
here the deviation is small between the actual and 
theoretical values. A very small increase in the 
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area of the Vllb curve would easily adjust this 
difference. It is, however, doubtful if such a cor- 
rection or adjustment would have any noteworthy 
effect upon the ultimate mortality rates g^, and I 
do not consider it worth while to go to the addi- 
tional trouble of recomputing the areas, especially 
in view of the fact that the observation data above 
the age of 80 are not exact and detailed enough to 
be used in this method of curve fitting. For ages 
up to 70 or 75 I consider, however, the table as 
thus constructed as sufficiently accurate for all 
practical purposes. 

11. MASSACHUSETTS ^^ another example of the me- 
ini^gir ^^^^ I *ake the construction 

of a mortality table for the 
State of Massachusetts from the mortuary records 
for the three years 1914, 1915 and 1916. The 
records as given by the Registration reports are 
better than the records for Michigan, in as much 
as they have avoided the deplorable practice of 
grouping all deaths above the age of 80 into a 
single age group. On the other hand, the classifi- 
cations of cause of death in Massachusetts by at- 
tained age are given in ten year age groups only. 
Hence it is readily seen that we will only be able 
to secure half as many observation equations as 
in the case of the five year interval in Michigan. 
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This rather large grouping puts the method to a 
severe test. In spite of this drawback I shall for 
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the benefit of the readers briefly outline the results 
I have obtained from an analysis of the Massachu- 
setts data. 

While for the Michigan data I employed a sy- 
stem of frequency curves previously used with 
success for certain Scandinavian data, I found it 
was easier to fit the Massachusetts data to a sy- 
stem of frequency curves used in the construction 
of a mortality table for England and Wales for 
the years 1911 and 1912 from the mortuary records 
of deaths by age and cause among male lives. The 
classification by age of the causes of death in 8 
groups is also different from that of Michigan, 
especially for middle life and younger ages. The 
parameters of the system of component frequency 
curves to which I fitted the Massachusetts data are 
shown in the following table F: 

Table F. 

Parameters of the System of Frequency Curves 

for Massachusetts Males 1914 — 1916. 



Group Mean 


Dispersion 


Skewness 


Excess 


I 78.70 years 


7,9775 years 


+ .0920 


+ .0331 


II 68.00 - 


12,2061 - 


+ .1161 


+ .0234 


III 63.06 - 


13,0632 - 


+ .1210 


+ .0471 


IV 60.46 


17,8562 - 


+ .0983 


- .0091 


V 49.60 - 


18,5100 - 


+ .0328 


-.0309 


VI 43.80 - 


14,6760 


-.0091 


- .0272 


Vllb 67.40 - 


12,1560 - 


+ .0021 


-.0025 


Vila and Villa constructed from Poisson-Charlier Curves. 
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The observed number of deaths according to the 
8 groups of causes of death, and their correspond- 
ing proportionate death ratios are given in the fol- 
lowing tables G and H. 

By finding first approximate values and then by 
a further correction of these approximation areas 
by means of the factors a. determined by the 
method of least squares in exactly the same man- 
ner as demonstrated in the case of Michigan, we 
finally arrive at the following areas of the various 
groups. 



Areas of the component frequency curves 
Life Table for Massachusetts Males, 1914- 


in the 
-1916. 




Areas 










Group I 
II 


90064 
281470 




•• 




m 


207854 






. 


■ rv 


151316 








V 


99543 








VI 


107718 








Vila & Vllb 


40719 


*■ 






Vina 


21316 







1000000 

Forming the products NF(x)ioT the various 
groups and integral ages we obtain finally the 
life table as shown in the appendix. In order 
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to test the "goodness of fit" of the curves it is 
necessary to compute the expected or theoretical 
proportional death ratios from this latter table and 
compare such ratios with the observed or actual 
proportionate death ratios as shown in Table H. 
The theoretical values are shown in Table I, and 
a graphical representation illustrating the "good- 
ness of fit*' between the observed and theoretical 
ratios is given in Fig. 5. I think it will be generally 
admitted that the fit is satisfactory for all practical 
purposes. 

The State of Massachusetts has always been the 
foremost state in the union for reliable and trust- 
worthy statistical records, and in all probability it 
would be possible to secure the deaths by causes in 
5-year age groups instead of ten-year groups. By 
taking the above table as a first approximation one 
should then obtain a very accurate table. On the 
other hand, it is possible to verify the final results 
in the above Life Table for Massachusetts by an 
entirely different process. It happens that the 
State of Massachusetts took a census in April 1915. 
This census for living males by attained ages could 
then be used as an approximation for the exposed 
to risk, while the deaths for the three years could 
be used as a basis for the number of deaths in a 
single year. A Life Table could then be con- 
structed by means of the orthodox methods usually 
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employed by actuaries and statisticians in the con- 
struction of mortality tables from census returns. 



"* comotiveen: -^s a third illustration, I shall 
TABLB^f 15— 17. construct a table for American 
OTHER TABLES Locomotivc Engineers for the 
period 1913 — ^1917. The statistical data forming 
the basic table are the mortuary records by at- 
tained age and cause of death among the members 
of The Locomotive Engineers* Life and Accident 
Insurance Association, a large fraternal order of 
the American Locomotive Engineers. The total 
number of deaths in the five year period amounted 
to more than 4,000. Distributed into separate 
groups of causes of death, it was found that it 
was possible to use a system of frequency curves 
similar to that employed in the State of Massachu- 
setts, except for Group No. IV, for which it was 
found exceedingly difficult to find a single curve 
which would fit the data, and much points towards 
the actual presence of a compound curve of that 
group of causes of death among the Locomotive 
Engineers. The grouping of causes of death is, also 
slightly, different from that of Michigan and Mas- 
sachusetts. I shall not go into further details as 
to the actual construction of this table, except to 
mention the areas of the various component fre- 
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quency curves of which I present the following 
table. 

Areas 
Group I 44,857 

— II 342,645 

— Ill 226,022 

— IV 147,420 

— V 47,650 

— VI 31,260 

— Vila 79,005 

— Vllb 77,713 

— VIII 3,428 



1,000,000 



It must also be remembered that the radix of 
this table is taken at age 20, instead of at age 10 
as is the case in the preceding tables. The final 
graph is shown on the preceding page. A num- 
ber of diagrams illustrating the "goodness of 
fit** are also attached and need no further com- 
ment. It might, however, be of interest to men- 
tion the fact that the American actuary, Moir, 
has recently constructed a mortality table for 
American Locomotive Engineers along the ortho- 
dox lines from the data contained in the Medico- 
Actuarial Mortality investigation. Moir*s table -- 
or at least the great bulk of the material from 
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which it was derived — falls in the interval be- 
tween 1900 and 1913. Owing to the energetic 
^'safety first" movement which since 1912 has been 
actively pursued by most of the leading American 
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Fig. 7. 



railroads, it is, however, to be expected that the 
period 1913 — 1917 indicates a reduced mortality as 
compared with that of Moir's period. This fact 
is also shown in the diagrams in Fig. 7.^ On the 
other hand, the almost parallel movements of 
Moir*s table with that of the table of the fre- 
quency curve method of 1913 — 1917, seems to 
indicate the soundness of the proposed method. 



* Curves I, II and V are Locomotive Engineers' Mor- 
tality Tables for various periods. 
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.^^,»,r^., ., A similar table showing mor- 

12 a, ADDITIONAL ° 

MORTALITY talitv condltlons among a de- 

TABLES "^ ® 

ddedly industrial or occupational 
group has been constructed for coal miners in the 
United States. The original data of the deaths by 
ages and specific causes were obtained from the 
records of several fraternal orders and a large indus- 
trial life assurance company and comprised nearly 
1600 deaths. The number of deaths above the age of 
sixty were, however, too few in number to determine 
with any degree of exax;titude the area of component 
curves for the older age groups. For ages below 
sixty-five the table should on the other hand give a 
true representation of the mortality among coal 
miners in American collieries during the period under 
consideration^). A particular feature of this table is 
the comparatively low mortality in group VI, which 
contains primarily deaths from tuberculosis. Coal 
miners present in this respect different conditions 
than those usually prevailing in dusty trades where 
the death rate from tuberculosis is unusually high. 
The same feature is also borne out in previous in- 
vestigations on the death rate of coal miners in Eng- 



' It was not possible to seperate anthracite and bituminous coal miners. 
The data indicate, that anthracite mine workers have a higher accident 
rate than workers in bituminous mines. 



Coal Miners. 



173 







i-i 
I 

CO 

a 

s 

i 

"3 



a 

00 



174 Human Death Curves. 

land, and by the recent investigations by Mr. F. L. 
Hoffman on dusty trades in America. 

In order to have a measure of the mortality pre- 
vailing among industrial workers in America, we 
submit a table derived from a very detailed collection 
of mortuary records by age, sex and cause of death 
as published by the Metropolitan Life Insurance Com- 
pany of New York. A deplorable defect jn this splen- 
did collection of data is the grouping together of all 
ages above seventy in a single age group, which 
makes it almost impossible to determine the com- 
ponent curves for higher ages with any degree of 
trustworthiness. 

The defect in the original Metropolitan data for 
older age groups made it neccessary to modify the 
earlier sets or famihes of curves which were used 
on the Michigan and Massachusetts data and to 
combine several of the subsidiary component curves, 
especially those for the older age groups. Such 
modifications were, however, easily performed by 
means of simple logarithmic transformations. 

I give below my grouping scheme for the Metro- 
politan data designated by the code numbers of the 
international list of causes of death. The actual 
cause of death corresponding to each code number 
is found under paragraph 8 of the present chapter. 
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GROUP I 
10, 39 to 46, 48, 50, 54, 63 b, 64 to 66, 68, 79, 81, 
82, 89 to 91, 94, 96, 97, 103, 105, 109 a, 120, 123, 124, 
126, 127, 142, 154. 

GROUP II 
4, 13, 14, 18, 26, 27, 32 to 35, 47 (over age 20), 49, 
51 to 53, 55, 60, 62, 70 to 72, 77, 78, 80, 83 to 88, 92, 
95, 98 to 102, 106, 107, 109 b, 110 to 119, 122, 125, 143 
to 145, 148, 149, 155 to 163. 

GROUP III 
28, 29, 31, 37, 38, 56 to 59, 67. 

GROUP IV a AND IV b 
1, 5 to 9, 17, 19, 20 to 25, 30, 61, 63 a, 73 to 76, 108, 
146, 147, 150, 164 to 186, 47 (under age 20). 

It will be noted that under this scheme Group I 
includes practically Groups I to III of the Michigan 
classification. Group II corresponds partly to IV and 
V for Michigan, Group III is practically Michigan's 
Group VI, while Group IV a and IV b takes in partly 
V, VII, and VIII in the Michigan experience. As a 
further correction I found it also advisable to transfer 
some of the deaths in the age intervals 10—14, 15=-"19, 
20—24, and 25—29 in Groups I and II to Group IV a 
so as to avoid the long left tail ends in these older 
age curves. 
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After grouping the deaths (more than 200,000) of 
the Metropolitan experience according to the above 
scheme, it is a simple matter to compute the various 




' 1 ' 

10 20 30 40 50 60 43eS. 



Fig. 9. 

values of R(x) of the four groups for quinquennial 
age intervals and use these values (altogether 52 in 
number) for finding the observation equations and in 
the subsequent determination of the component curves 
as shown in the final mortality table in the appendix 
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to this chapter. A comparison between the observed 
values of R(x) by quinquennial eiges and the con- 
tinuous values of R(x) (indicated by dptted curves) 
as computed from the final mortality table is shown 
in Fig. 9. The " fit" between calculated and observed 
vfidues is evidently satisfactory. 

A most instructive and unique experience is of- 
fered in the table of Japanese Assured Males for the 
four year period 1914-1917 and based upon the death 
records of more than a dozen of the leading Japanese 
Life Assurance Companies. About 35,000 deaths by 
cause and arranged in quinquennial age groups were 
available for this construction. The component curves 
for the older age groups were determined by a simple 
logaiithmic transformation of the variates and offered 
no particular obstaxjles in the a priori determination 
of the parameters. The curves for middle and younger 
life were more difficult to handle, especially the 
curves typical of tuberculosis, spinal meningitis and 
the peculiar Oriental disease known as Kakke, aris- 
ing from an excessive rice diet. A first attempt to 
use the same curve types as employed in some of the 
European and American data did result in a very 
poor fit between the observed and calculated values 
of R{x) for the younger age intervals clearly indica- 
ting that the clustering tendencies were different in 
the case of the Japanese data than in the other experi- 
ences I had previously deedt with. 

12 
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The peculiar form of the observed values of R{x) 
for the tuberculosis group indicated beyond doubt 
that the frequency curve for this group itself was a 
comjpound curve. I therefore decided to include both 
spinaj meningitis and kakke with the tuberculosis 
group, and treat this new group as a compound fre- 
quency curve with two components. By successive 
trials I finally succeeded in establishing a complete 
curve system which satisfied the ultinmte require- 
ment of the fit between the observed and calculated 
values of R{x) for the various groups.^ 

Grouping of Causes of Death in Japanese Assured 

Males 1914—1917, 

GROUP I 
Diseases of Arteries, Senility, Influenza, Cerebral 
Hemorrhage, Acute and Chrcaiic Bronchitis, Broncho- 
pneumonia. 

GROUP n 

Asthnm em-d Pulmonary Emphysema, Cancer (all 
forms). Tumor, Dial)etes, Other Diseases of Body, 
Paralytic Dementia, Tabes Dorsalis, Diseases of other 
organs for circulation of Blood, Chronic Nephritis, 
Other Diseases of Urinsu^ Organs. 

GROUP HI 

Mental Diseases, Other diseases of Spine and 
Medulla Oblongata, Other Diseases of Nervous 



^ See Addenda for the final table. 
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System, Diseases of Cardiac Valves, Pneumonia, 
Pleurisy, Other Respiratory Diseases, Gastric Catarrh, 
Ulcer of Stomach, Hernia, Other Diseases of Stomach, 
Diseases of Liver, Acute Nephritis, Diseases of Skin 
and Diseases of Motor Organs. 

GROUP IV a AND IV b 

Typhoid Fever, Malaria^ Cholera, Acute Infectious 
Diseases, Peritonitis, Suicide, Dysentery, Tuberculosis 
(all forms). Syphilis, Kaltke, Menehgitis, Inflamma- 
tion of the Caesum, Death by external causes (acci- 
dents, etc.). 

Arranging, the collected Japanese statistics bn 
causes of death among assured males by attained 
age at death in accordance with the above scheme 
of grouping, using a 5 year interval as the unit, we 
obtain the following double entry table for the 35207 
deaths as used in my computation for the various 
values ofR(x). 



Ages 


Group I 


Group II 


Group III 


Group IV 


Total 


lO-U 


3 


4 


37 


79 


123 


15—19 


17 


23 


216 


714 


970 


20-24 


37 


65 


181 


1640 


1923 


25—29 


62 


109 


324 


1975 


2470 


30—34 


124 


257 


800 


1993 


3174 


35—39 


278 


480 


1147 


2065 


3970 


40-44 


449 


662 


1299 


1674 


4084 


46-^9 


701 


957 


1352 


1482 


4491 


50—54 ' 


742 


959 


1115 


990 


3806 



12* 
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Ages Group I Group II Group III Group IV Total 



55—59 


864 


1045 


1041 


728 


3678 


60-64 


865 


847 


874 


482 


3068 


6&-69 


626 


571 


612 


186 


1995 


70—74 


399 


268 


347 


80 


1094 


75—79 


123 


76 


100 


20 


319 


80-84 


16 


13 


10 


3 


42 



The observed values of R(x) as derived from the 
aJDove table are shown in the staircase shaped histo- 
graph in Fig. 10. The correlated values of R(x) as 
calculated from the final mortality table are shown 
as dotted curves on the same diagram. The "fit" 
between observed and calculated values of R(x) is 
evidently satisfactory except for the youngest age 
intervals. 

The construction of the present Japanese table con- 
stitutes probably the most severe trial to which the 
proposed method has hitherto been put. We are here 
dealing with an entirely different race living under 
different economic conditions than the nations of 
Europe and America and afflicted with certain forms 
of diseases which are comparatively rare or unknown 
among the Western nations. 

It is therefore gratifying to note that the eminent 
Japanese actuary, Mr. T. Yano, in comparing the 
above mentioned table with an investigation he made 
on the aggregate mortality in 1913-1917 of all the 
Japanese life assurance companies (about 45 in num- 
ber) from the actucd number of lives exposed to risk 
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at various ages has been eible to test independently 
the validity of the proposed method to complete 
satisfaction. (See remarks in preface). 

13. cwncisMs AND With these remarks I shall 
SUMMARY close the mere technical dis- 
cussion of the proposed method 
and turn my attention to the arguments advanced 
by certain American critics against the possibility 
of constructing mortality tables from records of 
death alone. I deem no apology necessary to meet 
those critics and give a brief historical sketch of 
the origin of the proposed method, because re- 
marks along this line will tend to accentuate the 
difficulties the mathematically trained biometrician 
has to contend with in obtaining a hearing among 
the present day school of actuaries and stati- 
sticians. 

A good many critics, among whom I may men- 
tion Mr. John S. Thompson and Mr. J. P. Little, 
apparently have received an erroneous impression 
of the fundamental processes of the proposed me- 
thod and its evident departure from the conven- 
tional methods. Mr. Thompson states "If we un- 
derstand the process, the result is simply a gradua- 
tion of "dj* the "actual" deaths, and it is not 
apparent why a mortality table should not be 
formed from the unadjusted deaths and some other 
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function of graduation with equally good re- 
sults*'^. From this it would appear that Mr. 
Thompson is of the opinion that I have graduated 
the deaths as actually observed. As any one who 
will take the trouble to read the above article can 
see this is not the case. The actually observed 
numbers of deaths have only been used to con- 
struct the observed proportionate death ratios^. 

The whole process may be summarized as fol- 
lows : 

1) The choice (a priori) of a system of fre- 
quency curves based upon the hypothesis that the 
distribution of deaths according to age from typi- 
cal causes of death can be made to conform to 
those postulated frequency curves whose para- 
meters are known or chosen beforehand. 

2) The grouping of causes of death so as to 
conform with the above mentioned system of fre- 
quency curves, 

3) The computation for each age or age group 
of the proportionate death ratios of such groups 



* Proceedings of the Casualty Actuarial Statistical 
Society of America, Vol. IV, Pages 399—400. 

* These objections by Thompson and Little are shown 
in their full obscurity in the case of the tables for Lo- 
comotive Engineers, Coal Miners and Japanese Assured 
Males where the greatest number of observed deaths fell 
between ages 36 — 49. 
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from the collected statistical data of deaths by age 
and by cause of death. 

4) The choice of approximate values of the 
areas of the various component frequency curves. 
Such approximate values can be determined by 
inspection or by simple linear correlation methods. 

5) The determination by means of the theory 
of least squares of the various correction factors a 
with which the approximate values of the areas 
must be multiplied in order that we may obtain 
the probable values of the areas of the component 
curves. The observation equations necessary for 
this computation are obtained from the observed 
proportionate death ratios, which are indepen- 
dent of the exposed to risk. 

6) -The subsequent calculation of the products 
NF(x) for all groups and for all integral ages. 
This gives us again the total number dying from 
all causes at integral ages among the original 
cohort of 1,000,000 entrants at age 10. In other 
words the dx column from which the final morta- 
lity table can be constructed. 

7) The computation of the "expected** or 
theoretical proportionate death ratios from the 
final table and their subsequent comparison with 
the "actual** or observed proportionate death ra- 
tios to illustrate the "goodness of fit**. 

It is this last step which constitutes the verifica- 
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tion of the results derived by means of a purely 
deductive or mathematical process, and is a test 
of very stringent requirements. It is namely re- 
quired that there must be a simultaneous "fit", 
not alone for all groups of causes of death, but 
for all age intervals as well. 

The sole justification of the proposed method 
hinges indeed upon the validity of the hypothesis. 
Is it indeed possible to choose a priori a system 
of frequency curves to which to fit our observed 
data? Theoretically speaking each population or 
sample population, as for instance certain occupa- 
tional groups such as locomotive engineers, far- 
mers, textile workers, miners, etc. will in all pro- 
bability have its own particular system of fre- 
quency curves. From a purely practical point of 
view — and this is the one in which we are chiefly 
interested — we may, however, easily get along 
with a limited system af frequency curves for the 
various groups of causes of death and limit our- 
selves to a comparatively few sets of frequency 
curves to which to fit our statistical data. The 
case is analogous to that confronting a manufac- 
turer of shoes. Undoubtedly the foot of one indi- 
vidual is different in form from that of any other 
individual, and in order to g6t an absolutely fault- 
lessly fitting boot we would all have to go to a 
custom boot maker. Practical experience shows, 
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however, that it is possible to manufacture a few 
sizes of boots, say 6*8, 7's, 8*s and intermediate 
sizes in quarters and halfs, so as to fit to com- 
plete satisfaction the footwear of millions of 
people. Exactly in the same manner I have found 
from a long and varied experience in practical 
curve fitting that it is possible to fit the mortuary 
records of male deaths by attained age and cause 
of death to a comparatively limited number of sets 
of component curves, say not more than 5 or 6 
sets. Moreover, if in a certain sainple population 
a certain curve should not exhibit a satisfactory 
fit it is indeed a simple matter to change its para- 
meters so as to improve the fit. 

14. ADDITIONAL ^^ regard to the classification 
PBiNCiPLEs^oF ^^ *^® causes of death into a 
METHOD limited number of groups it 

seems that some of the critics of the method are 
of the opinion that this classification is ironclad 
and fixed. This, however, is not the case. While 
in a specific sample population a certain cause of 
death might fall in group II, it is quite likely 
that the same cause of death would oome under 
another group in another sample population. For 
instance, the deaths from asthma 9,re in Michigan 
grouped under Group II. In the case of Coal 
Miners such deaths would, however, go into group 
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IV or group V. If the classification of causes of 
death were fixed, the frequency curves for separate 
population would show great variations, and it 
would be out of the question to limit ourselves to 
a small set of systems of component curves. Mak- 
ing the classification flexible, we are, on the other 
hand, in a better position to proceed with a fewer 
number of curves. For instance, in order to use 
the postulated frequency curve for Group VI for 
Michigan it was necessary to place the cause of 
death listed as No. 186 (other accidental trau- 
matism) of the International Classification of 
Causes of Death in that group instead of in group 

V or VII, where most deaths of this type are or- 
dinarily classed. 

It would be interesting to see to what extent 
the proposed classification and the chosen system 
of frequency curves in Michigan deviates from 
the theoretically exact system of frequency curves. 
In the case of Michigan it would be impossible to 
test this. An approximate test might be obtained 
from the Michigan mortality data for the three 
year period 1909 — 1911. Professor Glover has con- 
structed a mortality table for males in the State 
of Michigan in this three-year period by means 
of the usual methods employed by actuaries by 
resorting to the exposed to risk. Starting with a 
radix af 1,000,000 at age 10 it is possible to break 
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up the deaths or the dx column of the Glover 
table into a set of subsidiary columns of death 
from groups of causes of death in the same order 
as given in Table A on page 133 by means of a 
simple application of the observed proportionate 
mortality ratios as derived from the 1909 — 1911 
period. On the basis of a radix of 1,000,000 sur- 
vivors at age 10 we find that according to the 
Glover Table, 5016 will die in the interval from 
50 — 54. Let us also suppose that the proportionate 
mortality ratios in group III for ages 50 — 54 
amounted to 0.23, then the number of deaths from 
group III in that particular interval in the Glover 
table would be 5016 x 0.23 =.1154. Similar num- 
bers could be found for the other groups and for 
arbitrary age intervals, and we would in this man- 
ner have an empirical representation of the fre- 
quency curves. This aspect of the matter is treated 
in brief form on another page. 

Returning now to our original discussion, it will 
readily be admitted that the method of construc- 
ting mortality tables by means of compound fre- 
quency curves cannot be considered as absolutely 
rigorous from the standpoint of pure mathematics. 
But neither can the usual methods of constructing 
mortality tables by graduation processes either by 
analytical formulas, mechanical interpolation for- 
mulas or a simple graphical process be considered 
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as mathematically exact. All statistical methods 
are, in fact, approximation processes. In the 
greater part of the realm of applied mathematics 
we have to resort to such approximation processes. 
It is thus absolutely impossible to solve correctly 
by ordinary algebraic processes simple equations 
of higher degree than the fourth. We encounter, 
however, in every day practice innumerable in- 
stances in which an approximation process, as for 
instance Newton's or Horner's methods or the 
method of finite differences, is sufficiently close to 
determine the roots of any equation so as to satisfy 
all practical requirements. 

From this point of view I claim that the pro- 
posed method in the hands of adequately trained 
statisticians -will yield satisfactory results, and I 
am inclined to think that the results are probably 
as true as the ones obtained by means of the usual 
methods, which especially in the c^se of gradua- 
tion by interpolation formulas often are affected 
with serious systematic errors. Moreover, there 
are sound philosophical and biological principles 
underlying the proposed method, which is perhaps 
more than can be said about the usual methods, 
purely empirical in scope and principle. On the 
other hand, I will readily admit that the proposed 
method is by no means a simple rule of the thumb 
and it can under no circumstances be entrusted to 



J90 Human Death Curves. 

the hands of amateurs. The whole process can in 
my opinion only be employed when placed in the 
hands of the adequately trained statistician who is 
thoroughly familiar with his mathematical tools, 
as provided in the formulas from the probability 
calculus. Such adequate training is not acquired 
over night, but only through a long and patient 
study. Meticulous and patient work is often re- 
quired before one is finally brought upon the right 
track, especially in the classification of the causes 
of death. Failure upon failure is oftentimes en- 
countered by the beginner in this work, and it is 
probably only through such failures that the in- 
vestigator is enabled to avoid the pitfalls of the 
often treacherous facts as disclosed by statistical 
data and steer a clear course. Mathematical skill 
is only acquired through a long and careful study. 
The illustrious saying of the Greek geometer, 
Euclid, who once told the Ptolemaian emperor 
that "there is no royal road in mathematics" holds 
true to-day as it did in the days of antiquity. 

The fact that the method is no simple mechani- 
cal rule, but one which can be entrusted into skill- 
ful hands only, is, moreover, in my opinion, one 
of its strong points, because it eliminates all at- 
tempts of dillet antes to make use of it. A large 
manufacturing plant would not, for instance, put 
an ordinary blacksmith or horseshoer to work on 
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making the fine tools for certain parts of automa- 
tic machinery employed in the manufacture of 
staple articles. Only the most skilled and highly 
trained tool makers are able to produce machine 
parts, which often require precision measurements 
running into one thousandth part of an inch. Nor 
would a large contracting firm dream of putting 
a backwoods carpenter in charge of the construc- 
tion of a skyscraper. Yet, this case is absolutely 
analogous to that of letting the mere collector of 
crude statistical data make an analysis and draw 
conclusions from certain collected facts as ex- 
pressed in statistical series of various sorts. 

While some American critics to all appearances 
have misunderstood the principle^ underlying the 
method, several European reviewers of the short 
summary of the method as originally published in 
the "Proceedings of. the Casualty Actuarial and 
Statistical Society of America** evidently have un- 
derstood its fundamental principles completely. 
The European critics seem, however, to be of the 
opinion that there is a rather prohibitive amount 
of arithmetical work involved in the actual con- 
struction of the mortality table. Thus a review in 
the Journal of the Royal Statistical Society for 
May 1918 has this to say : 

"Mr. Fisher's object is to construct a life 
table, being given only the deaths at ages and 
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not the population at risk. The hypothesis 
employed is that the total frequency of deaths 
can be resolved into specific groups of deaths, 
the frequencies of which cluster around cer- 
tain ages. The parameters of these sub-fre- 
quencies having been determined, the areas 
are deduced from a system of frequency cur- 
ves of the form : 

^ ' NsFsix) + N^Foix) + Nj,Fj,{x) . . . 

where -Bb(x), the proportional mortality at 
age X of deaths due to causes in group B and 
Fb^x), is obtained from the equation of the 
sub-frequency curve for cause J5, while Nb + 
iVc + iVjD + . . . + Nk= 1,000,000. The 
values of R{x) provide a system of observa- 
tional equations from which (by least squares) 
the values oi Nbs &c., can be obtained. 

"Since particularly in industrial statistics, 
or in general statistical inquiries under war 
conditions it is easier to obtain accurate data 
of deaths at ages than of exposed to risk, the 
success of the method is encouraging. It is, 
however, to be noted that the amount of arith- 
metical work envoi ved is considerable. Quite 
apart from the determination of the para- 
meters of the frequency curves, the formation 
and solution of the normal equations needed 
to compute the areas is a heavy piece of work. 
It would be of interest to see whether the re- 
solution into but three components eflFected by 
Professor Karl Pearson in his well-known 
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essay published in the "Chances of Death'* 
could be made to describe with sufficient ac- 
curacy an ordinary tabulation of deaths from 
age 10 onwards to lead to approximately cor- 
rect results for life table purposes. The test 
should, of course, be made with mortality 
data derived from a population very far from 
being stationary and the deductions compared 
with the results of standard methods. The 
subject is one of peculiar interest at the pre- 
sent time." 

From the above quotation it is evident that this 
English reviewer has a clear conception of the 
fundamental'principles upon which the method is 
based. His criticism is mainly directed against 
the heavy piece of arithmetical work involved. 
This work can, however, not be compared with 
the much more difficult task of obtaining the ex- 
posed to risk at various ages, which under all cir- 
cumstances would take much greater time and be 
infinitely more costly, in fact be absolutely pro- 
hibitive from a financial point of view. I wish in 
this connection to state that the whole arithmeti- 
cal work involved in the construction of the Michi- 
gan table was done by two computers in less than 
70 hours, while the corresponding table for Mas- 
sachusetts took about 75 hours. I do not know if 
this can be called exactly prohibitive. 

In regard to the remarks of my British critic 

13 
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concerning the Pearsonian method I might add 
that in my first attempt of an analysis of mortality 
conditions along the lines as described above I 
tried to subdivide the causes of death into four 
groups. It was, however, found that this was not 
always sufficient to describe the frequency dis- 
tribution of the number of deaths around certain 
ages. I doubt whether it is at all possible to des- 
cribe the frequency distribution in the various sub- 
groups by a system of normal curves, which, of 
course, would somewhat lessen the work. I have 
made attempts to do this, but so far I have not 
been successful except in a few cases. ^ It might 
be possible that we should succeed in this if we 
first set up a hypothetically determined curve of. 
the numbers exposed to risk. Such a curve might, 
for instance, be a normal curve. Personally, I be- 
lieve that little would be gained by such a proce- 
dure. More fruitful appears an analysis by means 
of correlation surfaces. The mortality table con- 
structed by the process as T have described it con- 
stitutes in its final form a correlation surface, 
wherein the age at death and the group of causes 
of death are the independent variables, and the 
number of deaths at a certain age and from a 



^ See Addenda for the Metropohtan Table and the 
Japanese Table. 
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certain group af causes of death is the numerical 
value of the correlation function of the two va- 
riates. Provided one could obtain an exact equa- 
tion of such a correlation surface, it would be a 
simple matter to construct a mortality table, and 
I hope that some statistician may in the future be 
induced to attempt a solution of the problem in 
this light. 

• 

15. ANOTHER AP- Beforc closing the discussion of 

PLICATION OF 

THEFREQUEN' this subject we shall, however, 

CY CURVE ME' . / . - , 

THOD give a brief description of an- 

other application of compound frequency curves in 
the construction of mortality tables. We have here 
reference to the use of skew frequency curves in 
the graduation of crude mortality rates as com- 
puted in the usual empirical manner as the ratio 
of deaths to the number of lives exposed to risk 
at various ages. On page 165 it was mentioned 
that the State of Massachusetts took a census in 
April 1915. This census together with the deaths 
for the triennial period from 1914 — 1916 makes 
it an easy matter to construct a mortality table in 
the conventional manner. Moreover, such a table 
can be compared with the previously constructed 
table from mortuary records by sex, age and cause 
of death only and shown in the appendix. 

In this connection it might be worth mention- 

13* 
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ing that my first table for Massachusetts as con- 
structed by compound frequency curves was pre- 
pared during the summer of 1918 and first pre- 



400oa 



3200C 



84000 



16000 



9OO0 




sented in a series of lectures delivered at the 
University of Michigan during the month of 
March 1919, while the final official report of the 
1915 Massachusetts census did not come in the 
hands of the present writer before May 1919. 
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The official census of the population of Mas- 
sachuetts by sex and single ages is given on page 
478 in Vol. Ill of the Massachusetts report from 
which Fig. 11 has been constructed. It is seen 
from a mere glance of this graph that there is an 
unduly high tendency among the figures to cluster 
around ages being multiples of 5. This tendency 
is especially marked in the age interval 30 — 60 
and presents a defect which is of no small im- 
portance in the construction of a mortality table 
by means of the conventional methods. It is in- 
deed doubtful if a table constructed from data 
so greatly influenced by observation errors and 
misstatements of ages can be considered as ab- 
solutely trustworthy. On the other hand the data 
ought to be sufficiently exact to test the results 
arrived at by the proposed method of compound 
frequency curves. 

We give below the male population in 5 year 
age groups for the middle census year of 1915 
and the corresponding deaths from all causes 
durirg the triennial period 1914 — 1916. 

MASSACHUSETTS 
1915 Male Population and Number of Deaths 
among Males from 1914 — 1916. 

Ages Population, X^. Deaths 1914 — 16. D^j. 
5— 9 169010 1715 

10—14 152419 1004 
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Ages 
15—19 
20—24 
25—29 
30—34 
35—39 
40—44 
45—49 
50—54 
55—59 
60—64 
65—69 
70—74 
75—79 
80—84 
85—89 
90—94 
95—99 
100 & over 



Population, L^- 
154773 

171961 

171017 

149294 

142617 

125462 

107909 

89490 

65133 

49079 

34790 

23638 

13724 

6494 

2479 

530 

124 

12 



Deaths 1914—16. B, 
1537 

2353 

2726 

2979 

3535 

4007 

4393 

5026 

5459 

5679 

6027 

5946 

4752 

3166 

1751 

540 

133 
23 



A few small discrepancies will be found to exist 
between this table and the table printed on page 
163, giving the observed deaths from various 
causes in ten year age intervals. This arises solely 
from the fact that a number of deaths were re- 
corded where the contributing cause was unknown 
and could, therefore, not be distributed in their 
proper groups. But this defect is of no influence 
in the construction of mortality table by means 
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of the method of compound frequency curves, un- 
less all the causes reported as unknown should 
happen to belong to the same group, which hardly 
can be assumed to be the case. At any rate the 
proportionate death ratios which are the keystone 
in this method of construction are for practical 
purposes left unaltered whether we include or ex- 
clude these few numbers of unknown causes. In 
the usual way of constructing tables from ex- 
posures and number of deaths it is on the other 
hand absolutely essential to include all deaths as 
otherwise the death rate will be underestimated. 
Bearing these facts in mind we therefore refer 
to the above figures of Lx and Dx for Massachu- 
setts Males from which we without further diffi- 
culty can construct an empirical mortality table, 
either by graphic methods or by simple summa- 
tion or interpolation formulas. There is indeed no 
dearth of such formulas, of which a large number 
have been devisedby Milne, Wittstein,Woolhouse, 
Higham, Sprague, Hardy, King, Spencer, Hen- 
derson, Westergaard, Gram, Karup and several 
other investigators. In the following computation 
I have used a formula originally devised by the 
Italian statistician, Novalis, and later on some- 
what modified by the English actuary, King. 
The following schedule shows the actual process 
in detail. 
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MASSACHUSETTS MALES. 

A. Population. 

Graduated Quinquennial Pivotal Values. 

» T> 1 X- T A r^ A 8t » Graduated 

Ages Population!,, Ai« A*i« Age p^ jj^y^.^ 



5— 9 
10—14 
15—19 
20—24 
25—29 
30—34 
35—39 
40—44 
45—49 
50—54 
56—59 
60—64 
65—69 
70—74 
75—79 
80—84 
85—89 
90—94 
95—99 
100—104 



169010 — 16591 

152419 + 2364 + 18945 12 29332 

154773 + 17188 + 14834 17 30836 

171961— 944 — 18132 22 34537 

171017 — 21723 — 20779 27 34369 

149294— 6677 + 15047 32 29739 

142617 — 17165 — 10478 37 28607 

125462 — 17563— 398 42 26095 

107909 — 18419— 866 47 21587 

89490 — 24357— 5938 52 17946 

66133 — 16054+ 8293 67 12961 

49079 — 14289 + 1766 62 9802 

34790 — 11152 + 3137 67 6933 

23638— 9914+ 1238 72 4717 

13724 — 8130 + 1884 77 2731 

6494— 4015 + 4115 82 1265 

2479 — 1949 + 2066 87 480 

530— 406 + 1543 92 104 

124— 112 + 294 97 23 

12 102 1 



Graduated Population = u^j^j = 0.2 L^^^ — 
O.OOSA'I-^+e 
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B. Deaths 1914—1916. 

Graduated Quinquennial Pivotal Values. 

Aees ^°- °^ AD /\'D Aire Graduated 

^'^^ Deaths D,^^' ^ ^' ^^^ Deaths 

5— 9 1715— 711 

10—14 1004 + 533 + 1244 12 200.8 

15—19 1537 + 816 + 283 17 307.4 

20—24 2353 + 373 — 443 22 470.6 

25—29 2726+ 253— 120 27 545.2 

30—34 2979 + 556 + 303 32 595.8 

35—39 3535 + 472— 84 37 707.0 

40—44 4007 +386— 86 42 801.4 

45—49 4393 + 633 + 247 47 878.6 

50—54 6026+ 433— 200 52 1005.2 

55—59 5459 + 220— 213 57 1091.8 

60—64 5679 + 348 + 128 62 1125.8 

65—69 6027— 81— 429 67 1205.4 

70—74 5946 — 1194 — 1113 72 1189.2 

75—79 4752 — 1586— 392 77 950.4 

80—84 3166 — 1415 + 171 82 633.2 

85—89 1751 — 1211+ 204 87 350.2 

90—94 540— 407 + 804 92 108.0 

95—99 133— 110 + 297 97 26.6 

100—104 23 102 4.6 

In this manner we obtain the graduated quin- 
quennial pivotal values of the population and of 

the deaths for ages 12, 17, 22, 27, ... . etc. Then 
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by dividing oiae third of the graduated deaths by 
the population we have the graduated pivotal 
values of the so-called "central death rates", or 
nix for quinquennial ages from age 12 and up. 
From these values of m, we easily find the corre- 
sponding values of q^ by means of the formula : 

_ 2mx 

We give below the results of this computation 
Massachusetts Males 1914 — 1916. 



Age 


1000 5, from Novalis' Formula 


12 


2.21 


17 


3.33 


22 


4.64 


27 


5.29 


32 


6.68 


37 


8.25 


42 


10.65 


47 


13.53 


52 


18.67 


57 


26.38 


62 


38.29 


67 


58.12 


72 


81.90 


77 


109.91 


82 


165.02 


87 


240.18 


92 


325.64 
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X 



The intervening values of q^ are without diffi- 
culty derived by interpolation formulas or by a 
graphical process. Once having all the values of 
qx for separate ages from age 10 and up it is a 
simple matter to form tables of Ix and dx commen- 
cing with a radix of 1,000,000 at age 10. Without 
going into tedious details we present the following 
values of Ix for decimal ages. 

Massachusetts Males 1914 — 1916. 

Age Ix Ages Xdx 

10 1,000,000 10—19 27,700 

20 972,300 20—29 47,330 

30 924,970 30—39 66,750 

40 858,220 40—49 98,650 

50 759,570 50—59 153,900 

60 605,670 60—69 233,150 

70 372,520 70—79 237,130 

80 135,390 60—89 124,760 

90 10,640 90 & over 10,640 

100 32 



16. GRADUATION It is to this table that we now 
BY FREQUENCY ^^all apply a process of re- 
cuRVES graduation by means of the 

method of compound frequency curves. Here we 
have already an empirical representation of the 
total compound curve of death or the dx curve. 
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This compound curve can now by simple and 
straightforward processes be broken up into its 
various component parts as to causes of deaths by 
means of the various observed proportionate mor- 
tality ratios, R^ shown in Table H on page 163. 

Let us for the sake of illustration take the age 
interval 40 — 49. According to our empirically con- 
structed table as derived from the Massachusetts 
1915 census we find that the number of deaths 
among the survivors in this age interval amounts 
to 98,650. 

Applying to this number the observed propor- 
tionate death ratios, i?^, in table H we are able to 
break this number up into its various component 
parts according to the groups of causes of death 
from which the numerical values of R were de- 

X 

rived. These component parts are as follows : 



Group 


Nc 


). of Deaths 


I 




1180 


II 




18050 


III 




17170 


IV 




17170 


V 




14300 


VI 




23970 


Vila 

nil 


&b 


5820 
990 




Total : 


98650 
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In the same manner we can break up the com- 
pound curve (the dx curve) in its eight component 
parts for all other age intervals, which finally givet 
us the following table of component groups, 
printed on the preceeding page, and graphically this 
table will represent a series of frequency diagrams 
of the various groups of causes of deaths. It is an 
easy matter to fit such diagrams to a system of 
Laplacean-Charlier or Poisson-Charlier frequency 
curves, which symbolically may be represented as 
follows : 

where F(x) is the frequency function of the per- 
centage distribution according to age of the va- 
rious component groups or curves, while N stands 
for the areas of such curves. 

These curve areas are simply the sub-totals of 
the respective groups in the above table. The pa- 
rameters giving the equations of the curves Fj (x), 
Fjj(x), Fjjj(x), .... are easily computed by the 
methods of moments and are shown in the follow- 
ing table on page 207. 

Once having determined the parameters of the 
various frequency curves it is a simple matter to 
construct the final mortality table which is shown 
in the addenda. 
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Values of Parameters of Component Curves, 



1 

}roup 


Mean 


Dispersion 


Skewness 


Excess 


I 


75.0 


9.78 


+ 0.080 


0.005 


II 


67.5 


13.65 


+ 0.117 


+ 0.017 


in 


64.0 


14.12 


+ 0.124 


+ 0.030 


IV 


60.5 


16.51 


+ 0.089 


—0.006 


V 


50.0 


18.61 


+ 0.026 


—0.034 


VI 


43.5 


15.57 


0.036 


0.023 


nih 


57.5 


16.33 


0.027 


—0.028 



It now remains for us to compare the final values 
of qx which we obtain from the three tables : 

A) The values of g^^ as computed in the usual 



^ In this grouping I have combined Vila and VIII 
into a single group and roughly fitted this group to a 
truncated Poisson-Charlier curve. This, of course, is not 
exact and introduces evidently errors in the younger 
age interval from 10 — 19. For ages above 20 this curve 
plays no importance and the other curves should for 
the ages above 20 give a satisfactory fit. If absolutely 
exactitude was required for younger ages it would 
indeed offer no difficulties to compute curves Vila and 
VIII separately and thus obtain a much closer fit in 
the youngest age interval. In view of the fact that 
the present calculation is a test case only, it has not 
been thought necessary to go to these refinements. 
This defect will af course also effect to a slight extent 
group VII b. 
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way from the number of lives exposed to risk and 
the corresponding deaths at various ages. 

B) The values of g* as obtained by a re-gradua- 
tion of the mortality table under A by means of 
compound frequency curves. 

C) The values oi qx constructed from mortuary 
records by sex, age and cause of death, but with- 
out knowing the numbers of lives exposed to risk. 



Values of 3000 g. 


by various 


methods. 


Age 


A 


B 


C 


17 


3.33 


3.15 


3.27 


22 


4.64 


3.99 


4.28 


27 


5.29 


5.04 


5.46 


32 


6.68 


6.72 


7.03 


37 


8.25 


8.63 


8.88 


42 


10.65 


10.83 


11,05 


47 


13.53 


13.86 


14.05 


52 


18.67 


18.83 


19.13 


57 


26.38 


26.88 


27.66 


62 


38.29 


38.79 


40.26 


67 


58.12 


59.04 


56.54 


72 


81.90 


76.50 


77.61 


77 


109.91 


103.69 


107.51 


82 


165.02 


137.97 


148.79 



I think that every unbiased investigator will 
admit that there exists a close agreement be- 
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tween the three series. It is indeed difficult to 
say which one of the three is the most probable. 
We know that on account of the great perturba- 
tions due to misstatements of ages the values 
under A are effected with considerable errors. The 
usual interpolation or summation formulas do not 
suffice to remove these errors and tend often to 
increase them. A re-graduation by means of fre- 
quency curves as shown in series B will in all 
probability give better results, although on ac- 
count of the large age interval (10 years) in which 
the causes of deaths are grouped in the Massa- 
chusetts reports this method does not come to its 
full right ^. The values of qx under A and B are 
naturally closely related to each other, and those 
in series B cannot be derived unless the values 
in series A are known beforehand. Series C on 
the other hand is independent of either A or B, 
having been derived by means of entirely different 
methods of construction. 



17. COMPARISON ^ comparison between the pa- 

^FERENTME^' ^ameters in the seperate com- 

THODs ponent curves in B and G 

gives us, however, a way of testing the validity 

of the hypothesis upon which the method of 



' See footnote on page 127. 

14 
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series G rests. In the case of the series G we star- 
ted with the hypothesis of the existence of a set 
of frequency curves of the percentage distnbution 
of the number of deaths according to age among 
the various groups. On the basis of this hypothesis 
and from the observed values of the proportionate 
death ratios, jB^, we determined by the method 
of least squares the areas of this postulated set of 
frequency curves. In the case of the B series we 
broke up the empirically constructed compound 
death curve (the d^ curve) into its various com- 
ponent parts according to a similar classification 
of causes of deaths as under G. We have therefore 
in this case an empirical determination of the 
areas of the component curves and all that we 
need to do is to graduate the rough frequency 
diagrams as represented by such areas to a system 
of frequency curves. 

Let us now briefly examine how far the various 
skew frequency curves in series B and C differ 
from each other. In regard to the various statis- 
tical parameters of the separate groups we have 
the following results : 

Means, 

Group Series C Series B 

I 78.5 75.0 

II 68.0 67.5 
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Group 


Series C 


Series B 


III 


63.0 


64.0 


IV 


60.5 


60.5 


V 


49.5 


50.0 


VI 


44.0 


43.5 


Vllb 


57.5 


57.5 


i 


Dispersions 


• 


Group 


Series C 


Series B 


I 


7.98 


9.78 


n 


12.21 


13.65 


III 


13.05 


14.12 


IV 


17.86 


16.51 


V 


18.51 


18.61 


VI 


14.68 


15.57 


Vllb 


12.16 
Skewness. 


16.33 


Group 


Series C 


Series B 


I 


+ 0.092 


+ 0.080 


II 


+ 0.115 


+ 0.117 


III 


+ 0.121 


+ 0.124 


IV 


+ 0.098 


+ 0.089 


V 


+ 0.033 


+ 0.026 


VI 


0.010 


—0.036 


Vllb 


0.002 


—0.027 



14* 
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Excess. 




Group 


Series C 


Series B 


I 


0.033 


0.005 


II 


+ 0.023 


+ 0.017 


III 


+ 0.047 


+ 0.030 


IV 


—0.009 


—0.006 


V 


—0.031 


—0.034 


VI 


0.027 


0.023 


Vllb 


—0.003 


0.028 



Taken all in all there is found to exist a satis- 
factory agreement between the hypothetical va- 
lues in series C and the values derived by empiri- 
cal methods. It is only in group / that we find 
some important discrepancies. This group contains 
causes of death typical of extreme old age where 
we naturally may expect great perturbations 
owing to large errors from random sampling, 
especially in series B. In this same connection 
we may also mention that the empirically deter- 
mined values under series B are subject to a slight 
correction by means of the Sheperd formulas, 
which were not employed in my computations. 

We have already mentioned that the system 
of frequency curves which we choose a priori 
for Massachusetts (Series C) was the same system 
which we had used on a previous occasion 
in the construction of a mortality table for Eng- 
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lish Males for the period 1911—1912'). This is a 
fact of no small importance. It will in general be 
found that the percentage distribution according 
to age in the various component curves differs 
little in different sample populations. Even in the 
case of American Locomotive Engineers it was 
found possible to use the same set of curves as in 
the case of Massachusetts and England and Wales. 
In the same way I have found that the set of 
curves used in the construction of the table of 
Michigan Males also can be used in the case of 
males in the urban population of Denmark. With 
a very few exceptions I have found it possible 
to get along with a limited number of sets of 
curves, say four or five sets. Should it never- 
theless prove impossible to fit the original data to 
any one of these particular curve systems, it will 
in most cases be found possible by means of suc- 
cessive approximations to reach a system of cur- 
ves which may be made the a priori basis for the 
construction of the final table as was the case in 
the table for Japanese assured males. 

Finally we come to the comparison of the vari- 
ous areas of the component curves. We have 
here : 



* See " Proceedings of the Casualty Actuarial Society 
of America", Vol. IV, page 409. 
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Areas. 






C 


B 


I 


90064 


105000 


II 


281470 


296190 


III 


207854 


213010 


IV 


151316 


144200 


V 


99543 


87850 


VI 


107718 


106260 


VII & VIII 


62035 


47410 



Total 1000000 1000000 

Evidently the agreement is not so close in this 
case. But it would indeed be rather rash to assert 
that the values in series C are faulty. One must 
here bear in mind the diametrically opposite 
principles employed in the determination of these 
areas. In series B we have a direct determination 
by empirical methods. In this determination we 
shall, however, find reflected all the original sy- 
stematic and observational errors originally pre- 
sent in series A from which the curves under JB 
were computed. Every error due to misstatements 
of ages and systematic errors introduced by the 
summation or interpolation formulas will be di- 
rectly reflected in the areas under series B, and 
such areas can therefore in a sense only be con- 
sidered as a first approximation to the true or 
presumptive areas. 
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Another point well worth remembering is the 
one that no conditions are imposed upon the areas 
in series B. In series G where we work with mor- 
tuary records only we have on the other hand the 
very important condition or restriction requiring 
that the areas of the component curves must be 
so determined that their ratios to the compound 
curve for various age intervals will conform as 
closely as possible with the observed proportionate 
death ratios, R^ , for those same age intervals. 

In. order to test the influence of this additional 
requirement in respect to conformity to observed 
proportionate death ratios we might use the values 
of the component curves under series B as a first 
approximation and then afterwards determine the 
correction factors a for the areas in exactly the 
same way as in the case of series C. No doubt 
such a calculation would tend to improve the 
table. 

A difficulty occurs, however, in the case of 
the Massachusetts data owing to the large interval 
of 10 years into which the causes of death by 
attained ages are grouped. As pointed out in the 
footnote on page 127 the quantity Rb (x) , (x = 

10, 11, 12, 100; B =1, II, in, ), 

can only be considered as being independent of 
the "exposed to risk" if the age interval into which 
the deaths fall is sufficientlv small. If this is not 
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the case, the "centrar* values of Rb i^) are 
subject to certain corrections. In the case of the 
groups of causes of death typical of younger ages 
the observed "central'* values of iJyn (^) and 
Ryui (x) for the age intervals 10 — 19, 20 — 29, 
30 — ^39 are evidently too high, while on the other 
hand the values of Rj (x) and Ru (x) in the case 
of the age intervals 60—69, 70—79, 80—89, 
90 — 100 are too low as compared with the true 
values of R{x) at these "central" ages. I have, 
however, tacitly ignored this fact in my computa- 
tions. The subsequent result is that the final 
values of q^ for the younger ages in column C as 
shown on page 208 are in all probability a little 
too high, and the values oi qx above 65 too low. 
In the case of the other tables as shown in the 
present book the age interval into which the causes 
of death were arranged was 6 years or less, and 
the error was thus reduced to such an extent that 
further corrections may be disregarded for all 
practical purposes. 
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Showing Detailed Mortality Tables and Death 
Curves for 

1) Japanese Assured Males (1914 — 1917) 

2) Metropolitan Life. White Males (1911—1916) 

3) American Coal Miners (1913—1917) 

4) American Locomotive Engineers (1913 — 1917) 

5) Massachusetts Males (Series C) (1914—1916) 

6) Michigan Males (1909—1915) 

7) Massachusetts Males (Series B) (1914—1916). 
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Age 



Mortality Table — Japanese Assured Males 
1914—1917 (Aggregate Table) 



n 



m 



IVa IVb 



dx 



lOOOqz 



15 


24 


65 


343 




2379 


2811 


1000000 


2.81 


16 


39 


74 


360 




3645 


4118 


997189 


4.13 


17 


43 


84 


388 




4888 


5403 


993071 


5.44 


18 


48 


93 


415 




5981 


6557 


987668 


6.64 


19 


54 


107 


446 




6826 


7433 


981111 


7.68 


20 


60 


120 


478 


■ 


7447 


8105 


973678 


8.32 


21 


68 


135 


513 




7716 


8432 


965573 


8.73 


22 


77 


153 


550 


12 


7734 


8526 


957141 


8.91 


23 


87 


171 


591 


27 


7581 


8457 


948615 


8.92 


24 


101 


195 


633 


60 


7274 


8253 


940158 


8.86 


25 


111 


218 


678 


77 


6864 


7948 


931905 


8.63 


26 


126 


246 • 


729 


112 


6384 


7597 


923957 


8.22 


27 


140 


278 


780 


153 


5860 


7211 


916360 


7.87 


28 


160 


315 


838 


206 


5341 


6860 


909149 


7.54 


29 


178 


353 


899 


268 


4821 


6519 


902289 


7.22 


80 


198 


395 


963 


341 


4823 


6220 


895770 


6.94 


31 


227 


446 


1033 


425 


3853 


5984 


889550 


6.73 


32 


252 


501 


1109 


521 


3421 


5804 


883566 


6.59 


33 


286 


557 


1185 


629 


3021 


5678 


877762 


6.46 


34 


319 


626 


1273 


751 


2665 


5683 


872084 


6.46 


35 


358 


700 


1364 


885 


2336 


5643 


866451 


6.51 


36 


401 


779 


1460 


1031 


2048 


5n9 


860808 


6.64 


37 


450 


872 


1564 


1186 


1797 


5869 


855089 


6.86 


38 


502 


970 


1671 


1350 


1566 


6059 


849220 


7.13 


89 


570 


1081 


1791 


1524 


1366 


6332 


843161 


7.51 


40 


638 


1197 


1916 


1701 


1191 


6643 


836829 


7.94 


41 


716 


1332 


2049 


1883 


1037 


7017 


830186 


8.45 


42 


802 


1475 


2193 


2066 


903 


7439 


823169 


9.04 


43 


899 


1632 


2341 


2249 


783 


7904 


815730 


9.69 


44 


1005 


1799 


2501 


2428 


680 


8413 


807826 


10.41 


45 


1126 


1985 


2671 


2599 


598 


8979 


799413 


11.23 


46 


1261 


2180 


2852 


2764 


514 


9571 


790434 


12.10 


47 


1406 


2393 


3042 


2917 


447 


10205 


780863 


13.07 


48 


1575 


2611 


3236 


3061 


395 


10878 


770658 


14.12 


49 


1754 


2867 


3459 


3187 


339 


11606 


759780 


15.27 


50 


1957 


3122 


3666 


3298 


295 


12338 


748174 


16.49 


51 


2180 


3395 


3892 


3389 


257 


13113 


735836 


17.82 


52 


2426 


3679 


4136 


3473 


224 


13938 


722723 


19.29 


53 


2692 


3984 


4380 


3532 


195 


14783 


708785 


20.86 


54 


2987 


4285 


4638 


3576 


172 


15658 


694002 


22.56 


55 


3306 


4610 


4922 


3611 


147 


16596 


678344 


24.47 


56 


3654 


4940 


5177 


3612 


130 


17513 


661748 


26.46 


57 


4026 


5274 


5456 


3605 


113 


18474 


644235 


28.68 


58 


4432 


5603 


5742 


3581 


97 


19455 


625761 


81.09 


59 


4857 


5937 


6025 


3544 


84 


20447 


606306 


33.72 


60 


5316 


6257 


6316 


3498 


74 


21461 


585859 


36.63 


61 


5795 


6568 


6604 


3424 


69 


22460 


564398 


39.79 


62 


6293 


6860 


6890 


3345 


59 


23447 


541938 


43.27 


63 


6805 


7129 


7162 


3255 


51 


24402 


518491 


47.15 


64 


7332 


7361 


7423 


3150 


43 


25309 


494089 


51.22 


65 


7854 


7570 


7672 


3042 


38 


26176 


468780 


55.84 
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&.ge 


I 


II 


in 


IVa 


IVb 


dx 


Ix 


lOOOqx 


66 


8366 


7727 


7896 


2919 


36 


26944 


442604 


60.88 


67 


8863 


7838 


8089- 


2791 


31 


27612 


415660 


66.43 


68 


9313 


7894 


8257 


2655 


28 


28147 


388048 


72.53 


69 


9719 


7894 


8385 


2511 


23 


28532 


359901 


79.27 


70 


10053 


7829 


8468 


2362 


20 


28732 


331369 


86.71 


71 


10294 


7700 


8503 


2212 


18 


28727 


302637 


94.92 


72 


10424 


7496 


8477 


2067 


15 


28479 


273910 


103.97 


73 


10424 


7227 


8389 


1901 


13 


27954 


245431 


110.69 


74 


10280 


6897 


8230 


1746 


13 


27166 


217477 


124.91 


75 


9970 


6503 


8002 


1593 


10 


26078 


190311 


137.02 


76 


9492 


6057 


7695 


1444 


10 


24698 


164233 


150.38 


77 


8834 


5571 


7313 


1298 


8 


23024 


139535 


165.00 


78 


8037 


5047 


6853 


1159 


7 


21103 


116511 


181.12 


79 


7086 


4499 


6314 


1026 


6 


18931 


95408 


198.42 


80 


6046 


3943 


5733 


900 


5 


16621 


76477 


217.33 


81 


4953 


3400 


5091 


784 


4 


14232 


59856 


237.77 


82 


3871 


2862 


4421 


676 


3 


11833 


45624 


259.35 


83 


2813 


2365 


3730 


577 


2 


9487 


33791 


280.75 


84 


1957 


1907 


3046 


489 


1 


7400 


24304 


304.48 


85 


1232 


1498 


2396 


412 




5538 


16904 


327.61 


86 


701 


1141 


1797 


340 




3979 


11366 


350.08 


87 


343 


844 


1275 


277 




2739 


7387 


370.76 


88 


140 


603 


844 


225 




1812 


4648 


389.78 


89 


48 


408 


516 


179 




1151 


2836 


405.85 


90 


14 


269 


283 


141 




707 


1685 


419.58 


91 


5 


171 


134 


110 




420 


978 


429.44 


92 




111 


53 


83 




247 


558 


442.65 


93 




56 


14 


63 




133 


311 


452.10 


94 




28 


4 


44 




76 


178 


457.05 


95 




14 


2 


81 




47 


102 


460.78 


96 




5 


1 


• 22 




28 


55 


509.01 


97 








14 




14 


27 


518.50 


98 








9 




9 


13 


692.30 


99 








4 




4 


4 


1000.00 



Age 



Mortality Table 
Metropolitan White Males 1911—1916 

I II III IVb IVa dx 



Ix 



lOOOqx 



10 


80 


153 


205 


47 


1720 


2205 


1000000 


2.21 


11 


95 


179 


274 


61 


1776 


2385 


997795 


2.39 


12 


118 


210 


350 


77 


1812 


2567 


995410 


2.58 


13 


141 


244 


444 


96 


1832 


2757 


992843 


2.78 


14 


168 


282 


550 


116 


1834 


2950 


990086 


2.98 


15 


202 


327 


671 


140 


1825 


3165 


987136 


3.21 


16 


240 


373 


SIO 


171 


1803 


3397 


983971 


3.45 


17 


282 


427 


960 


199 


1772 


3640 


980574 


3.71 


18 


336 


483 


1130 


233 


1733 


3915 


976934 


4.01 


19 


393 


545 


1315 


274 


1680 


4207 


973019 


4.32 


20 


454 


611 


1514 


311 


1612 


4502 


968812 


4.65 


21 


527 


685 


1728 


358 


1539 


4837 


964310 


5.02 


22 


599 


765 


1951 


407 


1449 


5169 


959473 


5.39 


23 


687 


845 


2184 


459 


1363 


5538 


954304 


5.80 



>20 
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Lge 


I 


II 


III 


IVb 


IVa 


dx 


Ix 


lOOOqx 


24 


775 


932 


2428 


515 


1279 


5929 


948766 


6.25 


26 


874 


1024 


2674 


575 


1190 


6337 


942837 


6.72 


26 


977 


1120 


2924 


638 


1107 


6766 


936500 


7.32 


27 


1088 


1223 


3173 


703 


1012 


7199 


929734 


7.74 


28 


1202 


1328 


3414 


770 


923 


7637 


922636 


8.28 


29 


1324 


1436 


3648 


839 


840 


8087 


914898 


8.84 


30 


1473 


1549 


3879 


909 


757 


8567 


906811 


9.45 


31 


1584 


1662 


4089 


985 


684 


9004 


898244 


10.02 


32 


1702 


1779 


4283 


1052 


614 


9430 


889240 


10.60 


33 


1863 


1899 


4459 


1125 


645 


9891 


879810 


11.24 


34 


2012 


2015 


4604 


1196 


485 


10312 


869919 


11.85 


35 


2160 


2139 


4740 


1266 


427 


10732 


869607 


12.48 


36 


2324 


2259 


4842 


1332 


378 


11135 


848876 


18.12 


37 


2485 


2379 


4919 


1399 


335 


11617 


837740 


13.75 


38 


2664 


2501 


4968 


1462 


296 


11891 


826223 


14.39 


39 


2847 


2617 


4989 


1520 


268 


12231 


814332 


15.02 


40 


3057 


2734 


4988 


1577 


226 


12678 


802101 


16.68 


41 


3272 


2848 


4953 


1628 


192 


12893 


789523 


16.33 


42 


3508 


2960 


4898 


1675 


163 


13204 


776630 


17.00 


43 


3767 


3066 


4821 


1719 


143 


13616 


763426 


17.70 


44 


4057 


3170 


4719 


1757 


120 


13823 


749910 


18.43 


45 


4389 


3267 


4604 


1789 


100 


14149 


736087 


19.22 


46 


4748 


3358 


4471 


1816 


90 


14483 


721938 


20.06 


47 


5153 


3447 


4320 


1839 


76 


14834 


707456 


20.97 


48 


5599 


3526 


4160 


1855 


61 


15201 


692621 


21.95 


49 


6064 


3598 


3991 


1867 


60 


15590 


677420 


23.01 


50 


6631 


3663 


3810 


1872 


42 


16018 


661830 


24.20 


31 


7198 


3721 


8630 


1872 


35 


16456 


645812 


26.48 


52 


7820 


3769 


3443 


1867 


30 


16929 


629356 


26.90 


53 


8492 


3809 


3254 


1857 


22 


17434 


612427 


28.47 


54 


9168 


3839 


3069 


1840 


10 


17926 


694993 


30.13 


55 


9897 


3858 


2876 


1820 


1 


18452 


677067 


31.98 


56 


10637 


3868 


2696 


1793 




18994 


568616 


34.00 


57 


11378 


3867 


2519 


1762 




19526 


639621 


36.18 


58 


12114 


3853 


2340 


1726 




20033 


520096 


38.52 


59 


12847 


3880 


2169 


1687 




20533 


500062 


41.06 


60 


13555 


3794 


2004 


1640 




20591 


479629 


43.77 


61 


14217 


3746 


1844 


1591 




21396 


368538 


46.67 


62 


14817 


8685 


1692 


1541 




21735 


437140 


49.72 


63 


15359 


3616 


1547 


1484 




22005 


415405 


62.97 


64 


15820 


3535 


1408 


1425 




22188 


393400 


56.40 


65 


16179 


3443 


1277 


1364 




22263 


871212 


69.97 


66 


16450 


3340 


1163 


1299 




22242 


348949 


63.74 


67 


16610 


3229 


1037 


1235 




22111 


326707 


67.68 


68 


16691 


3109 


930 


1166 




21896 


304596 


71.89 


69 


16591 


2981 


828 


1098 




21498 


282700 


76.06 


70 


16412 


2851 


736 


1030 




21029 


261202 


80.61 


71 


16107 


2711 


649 


955 




20422 


240173 


85.03 


72 


15721 


2568 


571 


892 




19752 


219761 


89.88 


73 


15225 


2423 


500 


825 




18973 


199999 


94.87 


74 


14629 


2271 


434 


759 




18093 


181026 


99.95 


75 


13946 


2126 


377 


696 




17144 


162933 


105.22 


76 


13225 


1976 


325 


632 




16158 


146789 


110.83 


77 


12423 


1828 


278 


572 




15101 


129631 


116.49 


78 


11580 


1684 


237 


515 




14016 


114530 


122.38 
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Age 


I 


11 


III 


IVb IVa 


dx 


Ix 


lOOOqx 


79 


10729 


1543 


200 


461 


12933 


100514 


128.67 


80 


9840 


1406 


167 


411 


11824 


87581 


135.01 


81 


8950 


1272 


138 


363 


10723 


75757 


141.54 


82 


8092 


1144 


115 


318 


9669 


65034 


148.68 


83 


7237 


1024 


98 


282 


8641 


55365 


156.07 


84 


6420 


911 


79 


247 


7657 


46724 


163.88 


85 


5645 


806 


65 


208 


6724 


39067 


172.11 


86 


4920 


707 


53 


181 


5861 


32343 


181.21 


87 


4240 


615 


43 


150 


5048 


26482 


190.62 


88 


3622 


531 


34 


126 


4313 


21434 


201.22 


89 


3065 


457 


27 


106 


3655 


17121 


213.48 


90 


2550 


387 


22 


87 


3046 


13466 


226.20 


91 


2099 


327 


16 


70 


2512 


10420 


241.07 


92 


1698 


270 


14 


56 


2038 


7908 


257.71 


93 


1355 


222 


11 


45 


1633 


5870 


278.19 


94 


1053 


179 


8 


35 


1275 


4237 


300.92 


96 


805 


143 


6 


27 


981 


2962 


331.20 


96 


595 


112 


5 


20 


732 


1981 


369.51 


97 


412 


85 


1 


14 


512 


1249 


409.93 


98 


286 


62 




10 


358 


737 


485.75 


99 


198 


27 




6 


231 


379 


609.50 


100 


95 


15 




4 


114 


148 


770.27 


101 


27 


5 




2 


34 


34 


1000.00 



Mortality Table — American Coal Miners 

(1913—1917) 



Age 

18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 



II III IV Va Vb VI 



99 

114 

140 

162 

190 

223 

256 

298 

341 

390 

440 

498 

557 

622 

688 

761 

837 

915 

994 

1084 

1171 

1267 

1364 

1471 

1581 



124 
144 
168 
194 
223 
250 
282 
315 
349 
386 
424 
461 
500 
538 
579 
618 
654 
693 
732 
775 
818 
867 
920 
978 
1045 



142 
164 
187 
214 
243 
272 
307 
341 
379 
421 
465 
508 
560 
609 
663 
718 
777 
840 
905 
973 
1045 
1124 
1206 
1293 
1386 



4566 
4702 
4954 
5196 
5234 
5151 
5067 
4952 
4846 
4748 
4683 
4569 
4413 
4220 
4000 
3757 
3500 
3233 
2963 
2697 
2435 
2184 
1946 
1723 
1515 



7 

10 

14 

19 

27 

38 

50 

69 

91 

120 

156 

202 

257 

326 

408 

505 

618 

749 

898 

1064 

1251 

1452 

1667 

1894 

2131 



366 

408 

452 

498 

546 

597 

646 

697 

749 

802 

853 

903 

953 

1002 

1048 

1093 

1133 

1175 

1212 

1246 

1277 

1305 

1329 

1352 

1369 



dx 

5304 
5542 
5915 
6283 
6463 
6631 
6608 
6672 
6765 
6867 
7021 
7141 
7240 
7317 
7386 
7452 
7519 
7605 
7704 
7839 
7997 
8199 
8432 
8711 
9027 



Ix 

1000000 
994696 
989154 
983239 
976956 
970493 
963962 
957354 
950682 
943927 
937060 
930039 
922898 
915658 
908341 
900955 
893503 
885984 
878379 
870675 
862836 
854839 
846640 
838208 
829497 



lOOOqx 

5.30 
5.57 
5.98 
6.39 
6.62 
6.73 
6.86 
6.97 
7.11 
7.27 
7.49 
7.68 
7.84 
7.99 
8.13 
8.27 
8.42 
8.58 
8.77 
9.00 
9.27 
9.59 
9.96 
10.39 
10.88 
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Age I II in IV Va Vb VI dx Ix lOOOqr 

43 1706 1125 1489 1326 2372 1383 9399 820470 11.46 

44 1835 1222 1585 1106 2609 1396 9762 811071 12.02 

45 1 1976 1322 1712 883 2841 1403 10133 801319 12.65 

46 6 2132 1444 1837 853 3063 1408 10743 791186 13.58 

47 10 2302 1584 1971 729 3266 1410 11271 780443 14.44 

48 21 2492 1741 2114 619 3443 1408 11838 769172 16.39 

49 32 2706 1918 2265 524 3696 1402 12441 757334 16.48 
60 42 2934 2118 2423 442 3706 1395 13060 744893 17.53 
51 54 3190 2337 2689 368 3790 1383 13711 731883 18.74 
62 73 3470 2567 2764 307 3832 1368 14380 718122 20.0a 
53 94 3775 2820 2945 256 8832 1362 16073 703742 21.4a 
64 123 4104 3086 3130 210 3790 1331 16774 688669 22.91 
56 153 4437 3356 3313 173 3706 1308 16446 672896 24.44 

56 186 4843 3637 3501 141 3595 1281 17183 656450 26.18 

57 225 6246 3922 3689 115 3443 1252 17892 639267 27.99 

58 268 5656 4192 3872 93 3265 1220 18566 621375 29.8a 

59 310 6086 4454 4047 76 3063 1186 19221 602809 31.8» 

60 364 6530 4703 4209 61 2841 1148 19846 583588 34.01 

61 402 6970 4936 4364 48 2609 1109 20438 663742 36.25 

62 450 7403 5133 4500 39 2372 1076 20964 543304 38.59 

63 608 7832 5305 4618 80 2131 1023 21447 522340 41.05 

64 573 8230 6438 4718 24 1894 978 21856 500893 43.6a 
66 648 8615 5533 4796 19 1667 931 22208 479038 46.3ft 

66 746 8954 6581 4846 16 1452 884 22478 456830 49.20 

67 875 9255 5596 4871 18 1251 834 22695 434362 62.25 

68 1016 9607 5663 4871 9 1064 786 22814 411667 65.41 

69 1207 9704 5479 4841 6 898 736 22871 388843 68.81 

70 1437 9846 5368 4786 6 749 686 22868 365972 62.49 

71 1702 9917 5196 4701 4 618 637 22775 343104 66.3a 

72 2008 9931 4999 4592 4 506 588 22627 320329 70.64 

73 2334 9871 4771 4460 2 408 640 22386 297702 76.20 

74 2677 9747 4513 4302 2 326 494 22061 275316 80.19 

75 3028 9567 4233 4125 2 257 449 21661 253255 85.49 

76 3332 9307 3941 3929 1 202 408 21120 231604 91.19 

77 3610 9001 3638 3722 1 166 366 20494 210484 97.37 

78 3827 8643 3322 3496 120 329 19737 189990 103.88 

79 3967 8237 3012 3267 91 293 18867 170253 110.82 

80 4020 7799 2704 3029 69 258 17879 151386 118.19 

81 3980 7327 2411 2788 50 226 16782 133607 126.70 

82 3916 6803 2123 2552 38 198 15630 116725 133.99 

83 3658 6315 1846 2313 27 171 14330 101095 141.75 

84 3370 5801 1596 2086 19 147 13018 86766 160.04 

85 3040 6286 1366 1862 14 126 11693 73747 16866 

86 2684 4776 1151 1650 10 106 10376 62054 167.21 

87 2305 4281 957 1448 7 88 9086 51678 176.82 

88 1937 3809 789 1261 6 71 7872 42692 184.82 

89 1684 3353 640 1085 3 60 6726 34720 193.69 

90 1269 2924 513 927 2 48 6683 27996 203,09 

91 986 2535 404 784 2 38 4748 22312 212.89 

92 747 2168 310 660 1 29 3905 17664 222.83 
94 551 1845 231 531 22 3180 13659 232.81 

94 396 1545 170 428 17 2666 10479 243.92 

95 278 1279 119 338 12 2026 7923 255.71 

96 198 1050 79 261 7 1594 5897 270.31 

97 126 846 48 196 5 1219 4303 283.29 
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Age 


I 


II 


III 


IV 


Va 


Vb 


VI 


dx 


Ix 


lOOOqx 


98 


85 


672 


26 


140 






2 


925 


8084 


299.94 


90 


70 


525 


9 


96 








701 


2159 


324.69 


100 


85 


401 




59 








495 


1458 


339.51 


101 


24 


298 




29 








351 


963 


864.48 


102 


19 


217 




4 








240 


612 


392.16 


103 


14 


149 












163 


872 


438.17 


104 


10 


97 












107 


209 


511.96 


105 


8 


56 












63 


102 


727.66 


106 


6 


25 












37 


89 


794.87 


107 


3 


2 












5 


8 


625.00 


108 


2 














2 


3 


666.67 


109 


1 














1 


1 


1000.00 
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ADDENDA II 



In order to show a rapid application of frequency 
curve methods to the graduation of mortality tables 
when the number of lives exposed to riak at various 
ages is known, the following data, relating to appli- 
cants who had been rejected for life assurance on 
account of impaired health, by Scandinavian assur- 
ance companies is instructive. The original stati- 
stics OS collected by a committee of the insurance 
companies were first published in the quinquennial 
report (1910—1915) of the Danish Government life 
Assurance Institution (The Statsanstalt) for 1917. 

The material related to Scandinavian and Finnish 
applicants who previously to 1893 (and in the case 
of two Danish companies before 1899) had been re- 
jected for life assurance. By a special investigation, 
the committee folloiwed up these rejections and sought 
to establish whether the applicants were alive at July 
1, 1899, or were previously deceased. Detailed re- 
ports for the fuU period during which the risks were 
under observation were available for 8,208 individual 
applicants. For 2,023 applicants complete data were 
not availahle. 

The final statistical results of the Statsanstalt's in- 
vestigation are shown in the following summary 
table: 





Addenda. 


231 




TABLE L 




Mortuary Experience of Rejected 


Risks of Scandi 


navian 


Life Companies. 


Attained 


No. Exposed 


Nnmber 


Age 


to Risk 


of Deaths 


15-19 


434 


6 


20-24 


3,831 


28 


25-29 


11,405 


145 


30-34 


17,644 


233 


35-39 


19,442 


318 


40-44 


17,600 


324 


45-49 


13,971 


296 


50-54 


10,179 


295 


55-59 


6,640 


264 


60-64 


3,927 


194 


65-69 


1,996 


96 


70-74 


836 


71 


75-79 


306 


32 


80-84 


98 


20 


85-89 


12 


3 



The exposed to risk by separate a,ges and the 
correlated deaths are shown in Table II in Columns 
2 and 3, from which we, without difficulty, obtain the 
crude or ungraduated mortality rates, as shown 
Column 4. 

We next assume a purely hypothetical frequency 
distribution of the exposed to risk, according to age, 
represented by a Laplacean normal probability curve 
with its mean or origin at age fifty and a dispersion 
equal to 12.5 years, as shown in Column 5. The fre- 
quency distribution of the number of deaths on the 
basis of the ungraduated mortality rates in Column 4 
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and the above-mentioned normal probability curve is 
shown in Column 6, which may be considered as an 
ungraduated compound frequency curve. * 

Arr8uiged in quinquennial age intervals this latter 
frequency distribution is shown in the following sum- 
mary table: 



Agea 

13-17 
18-22 
23-27 

28-32 
33-37 
38-42 
43-47 
48-52 
53-57 
58-62 
63-67 
68-72 
73-77 
78-82 
83-87 
88-92 
93 or over 



No. of Boaths 
51 

75 

329 

711 
1,464 
2,498 
3,649 
5,377 
6,238 
6,232 
5,254 
3,605^ 
2,536 
1,425 
1,169 

351 
95 



Total . . . 41,069 

The above frequency distribution is now subjected 
to a graduation by means of the Laplacean — Gharlier 
or Gram — Gharlier frequency function. The mathe- 
matical calculations give the following parameters: 



^ A slight adjustment was made in tbe figures in column (6) ooires* 
ponding to age 70, and in the age groups above the age of 88. 
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Mean Age 57.75 years 

Dispersion 13.32 years 

Skewness —0.0031 

Excess — O.0037 

Applying these parameters to standard probability 
tables we obtain the usual Laplacean — Charlier fre- 
quency curve. Distributing the 41,069 individual 
deaths according to this frequency curve we obtain 
column (7) which is the graduated death curve cor- 
responding to the hypothetical exposure as given by 
column (5). The final mortality rates per 1,000 of 
exposed to risk are then found by dividing (7) with 
(5) and are shown in column (8). 

In order to show how close the graduation by 
means of frequency curves agrees with the actual 
observations, I have made a calculation of the 
"actual" to the "expected" deaths by quinquennial 
age intervals as shown in the following table: 

TABLE III. 

Comparison between *" Actual' and ** Expected'' 

Deaths on the Basis of the Graduated Mortality 

Rates of the Scandinavian Mortality Table for 

Rejected Lives 

Ages 

15-19 
20-24 
25-29 
30-34 
35-39 
40-44 



No. Exposed 


Actual 


Expected 


to Bisk 


Deaths 


Deaths 


434 


6 


3.4 


3,831 


28 


37.6 


11,405 


145 


133.4 


17,644 


233 


242.2 


19,442 


318 


314.3 


17,600 


324 


336.8 
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Ages 

45-49 
50-54 
55-59 
60-64 
65-69 
70-74 
75-79 
80-84 
85-89 
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No. Exposed Actual 
to Risk Deaths 


Expected 
Deaths 


13,971 296 


321.8 


10,179 295 


287.2 


6,640 264 


234.8 


3,927 194 


178.6 


1,995 96 


119.5 


836 71 


67.4 


306 32 


33.8 


98 20 


15.1 


12 3 


2.5 



Total 108,320 2,325 2,328.4 

Considering the somewhat meager experience on 
which the graduation was based, I think it must be 
admitted that the method of frequency curves comes 
surprisingly close to the actual facts. In this connec- 
tion it is of interest to note that the actuaries of the 
Danish Statsanstalt made a graduation of the above 
data on the basis of Makeham's method and obtained 
from least square methods the following values for 
the constants. ^ 

A = 0.006 

log B = 7.0566 — 10 

log C = 0.025 

The "expected" deaths according to this latter 
graduation, and on the basis of the above experience, 
amount in total to 2,317 as against 2,325 "actual" 
deaths and 2,328 "expected" deaths according to the 
frequency curve method. Viewed from the stand- 



^ See formula (6) page 192 of Institute of Actuaries Text Book. Life 
Contlns^encles by E. F. Spurgeon, London, 1922. 
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point of the principle of least squares it is also found 
that the sum^ of the squares of the deviations is smal- 
ler under the frequency curve method than under the 
method of MaJceham, which seems to be pretty good 
evidence of the soundness of the method in spite of 
the fact that I throughout have worked with un- 
weighted observations. If properly chosen weights 
were apphed to the observations even closer results 
could be obtained. 



TABLE 11. 

Mortality Experience of Rejected Scandinavian Risks 

(Male). 











(5) 


(6) 


(7) 

Graduated 

Death 

Curve 




(1) 


(2) 
Exposed 


(3) 
No. of 


(4) 
(3) : (2) 


Hypo- 
thetical 


(5) X (4) 
Crude 


(8) 
(7) : (5) 


Age 


to Risk 


Deaths 


Expo- 
sure 


Death 
Curve 


lOOOqx 


15 


11 





0.00000 


792 





5.6 


7.07 


16 


31 


1 


0.03226 


987 


32 


7.1 


7.07 


17 


64 


1 


0.01562 


1223 


19 


9.2 


7.52 


18 


121 





0.00000 


1506 





11.7 


7.77 


19 


207 


4 


0.01932 


1842 


3 


15.4 


8.36 


20 


340 


1 


0.00294 


2239 


7 


19.7 


8.80 


21 


601 


1 


0.00200 


2705 


5 


25.0 


9.24 


22 


719 


6 


0.00834 


3246 


27 


30.8 


9.49 


23 


982 


6 


0.00611 


8871 


24 


38.8 


10.02 


24 


1289 


14 


0.01086 


4586 


50 


47.8 


10.42 


25 


1619 


22 


0.01359 


5399 


73 


58.2 


10.78 


26 


1986 


23 


0.01158 


6316 


73 


70.6 


11.18 


27 


2287 


34 


0.01487 


7341 


109 


85.0 


11.58 


.28 


2597 


29 


0.01117 


8478 


95 


101.7 


12.00 


29 


2916 


37 


0.01269 


9728 


123 


120.5 


12.39 


30 


3180 


38 


0.01195 


11092 


133 


142.0 


12.80 


31 


3395 


50 


0.01473 


12566 


185 


166.4 


13.24 


32 


3564 


44 


0.01235 


14146 


175 


193.5 


13.68 


33 


3700 


46 


0.01243 


15822 


197 


223.4 


14.12 


34 


3806 


55 


0.01445 


17585 


254 


257.0 


14.61 


35 


3882 


48 


0.01236 


19419 


240 


293.3 


15.10 


36 


3943 


64 


0.01623 


21307 


346 


332.8 


15.62 


37 


3921 


72 


0.01836 


23230 


427 


375.3 


16.16 


38 


3880 


66 


0.01701 


25164 


428 


420.0 


16.69 


39 


3816 


68 


0.01782 


27086 


483 


467.7 


17.27 


40 


3737 


66 


0.01766 


28969 


512 


517.6 


17.87 


41 


3637 


63 


0.01732 


30785 


533 


566.9 


18.41 



240 



Addenda. 



(1) 

Age 



42 
43 
44 
46 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 



(2)1 
Exposed 
to Bisk 

8539 

8426 

8261 

3079 

2941 

2793 

2653 

2505 

2348 

2184 

2024 

1882 

1741 

1610 

1447 

1308 

1189 

1086 

966 

871 

786 

701 

603 

518 

453 

392 

340 

291 

244 

193 

158 

132 

109 

91 

74 

58 

45 

37 

31 

24 

18 

15 

9 

6 

3 

2 

2 

0.5 



(3) 
No. of 
Deaths 

59 
62 
74 
67 
61 
46 
61 
61 
61 
65 
66 
59 
44 
62 
60 
45 
47 
50 
44 
35 
85 
44 
36 
22 
24 
19 
16 
15 
25 
17 
13 

9 

7 

8 
10 

8 

4 

2 

5 

6 

2 

4 

3 

2 



1 

1 





(4) 
(8) : (2) 

0.01667 
0.01810 
0.02269 
0.02176 
0.02074 
0.01647 
0.02299 
0.02435 
0.02598 
0.02976 
0.03261 
0.03135 
0.02527 
0.03851 
0.04147 
0.03440 
0.03953 
0.04604 
0.04555 
0.04019 
0.04453 
0.06277 
0.05970 
0.04247 
0.05298 
0.04847 
0.04706 
0.05155 
0.10246 
0.08808 
0.08228 
0.06818 
0.06422 
0.08791 
0.13514 
0.13793 
0.08889 
0.05405 
0.16129 
0.25000 
0.11112 
0.26667 
0.33334 
0.33334 
0.00000 
0.50000 
0.50000 
0.50000 



(6) 
Hypo- 
thetical 
Expo- 
sure 

32506 

34105 

35553 

36827 

37903 

38762 

39387 

39767 

39894 

39767 

39387 

38762 

37903 

36827 

35553 

34105 

82506 

30785 

28969 

27186 

25164 

23230 

21307 

19419 

17585 

15822 

14146 

12566 

11092 

9728 

8478 

7341 

6316 

5399 

4586 

3871 

3246 

2705 

2239 

1842 

1506 

1223 

987 

792 

631 

499 

393 

307 



(6) 

(5) X (4) 

Crude 

Death 

Curve 

542 

617 

807 

801 

786 

638 

906 

968 
1036 
1183 
1284 
1215 

958 
1418 
1474 
1173 
1285 
1417 
1320 
1089 
1121 
1458 
1272 

825 

932 

767 

666 

648 
1136 

857 

698 

501 

406 

475 

620 

534 

289 

146 

361 

461 

168 

326 

329 

264 

000 

250 

197 

154 



<7> m 



623.3 

678.2 

732.7 

787.8 

842.4 

895.1 

945.9 

994.3 

1039.0 

1079.9 

1116.0 

1147.4 

1173.3 

1193.0 

1206.9 

1214.3 

1214.9 

1209.0 

1197.0 

1178.8 

1154.2 

1124.6 

1090.1 

1050.7 

1006.3 

960.1 

909.6 

858.4 

804.2 

750.9 

696.7 

642.4 

589.1 

537.7 

486.8 

440.3 

393.8 

351.9 

311.8 

274.5 

241.6 

209.5 

181.6 

155.9 

133.4 

113.4 

95.6 

79.2 



19.17 

19.89 

20.61 

21.39 

22.23 

22.97 

24.02 

25.00 

26.04 

27.16 

28.33 

29.53 

30.96 

32.39 

33.95 

36.60 

37.37 

39.27 

41.32 

43.52 

45.87 

48.41 

51.16 

64.11 

67.22 

60.68 

64.30 

68.31 

72.60 

77.19 

82.06 

87.51 

93.27 

99.69 

106.15 

113.74 

121.32 

130.09 

139.26 

149.02 

160.42 

171.30 

183.89 

196.84 

211.41 

227.26 

243.00 

257.98 



Note: — The observations above age 87 are not reliable. 
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